Improve Config Documentation (#351)

This PR adds a utility script that generates config documentation from pgcat.toml. Ideally, we'd want to generate the configs directly from config.rs where the actual defaults are set but this is a good start as we already had several undocumented config flags.
2026-05-31 23:19:05 +00:00 · 2023-03-10 22:00:28 -06:00
parent 0704ea089c
commit b09f0a3e6b
4 changed files with 490 additions and 72 deletions
@@ -0,0 +1,340 @@
+# PgCat Configurations 
+## `general` Section
+
+### host
+```
+path: general.host
+default: "0.0.0.0"
+```
+
+What IP to run on, 0.0.0.0 means accessible from everywhere.
+
+### port
+```
+path: general.port
+default: 6432
+```
+
+Port to run on, same as PgBouncer used in this example.
+
+### enable_prometheus_exporter
+```
+path: general.enable_prometheus_exporter
+default: true
+```
+
+Whether to enable prometheus exporter or not.
+
+### prometheus_exporter_port
+```
+path: general.prometheus_exporter_port
+default: 9930
+```
+
+Port at which prometheus exporter listens on.
+
+### connect_timeout
+```
+path: general.connect_timeout
+default: 5000 # milliseconds
+```
+
+How long to wait before aborting a server connection (ms).
+
+### idle_timeout
+```
+path: general.idle_timeout
+default: 30000 # milliseconds
+```
+
+How long an idle connection with a server is left open (ms).
+
+### healthcheck_timeout
+```
+path: general.healthcheck_timeout
+default: 1000 # milliseconds
+```
+
+How much time to give the health check query to return with a result (ms).
+
+### healthcheck_delay
+```
+path: general.healthcheck_delay
+default: 30000 # milliseconds
+```
+
+How long to keep connection available for immediate re-use, without running a healthcheck query on it
+
+### shutdown_timeout
+```
+path: general.shutdown_timeout
+default: 60000 # milliseconds
+```
+
+How much time to give clients during shutdown before forcibly killing client connections (ms).
+
+### ban_time
+```
+path: general.ban_time
+default: 60 # seconds
+```
+
+How long to ban a server if it fails a health check (seconds).
+
+### log_client_connections
+```
+path: general.log_client_connections
+default: false
+```
+
+If we should log client connections
+
+### log_client_disconnections
+```
+path: general.log_client_disconnections
+default: false
+```
+
+If we should log client disconnections
+
+### autoreload
+```
+path: general.autoreload
+default: false
+```
+
+When set to true, PgCat reloads configs if it detects a change in the config file.
+
+### worker_threads
+```
+path: general.worker_threads
+default: 5
+```
+
+Number of worker threads the Runtime will use (4 by default).
+
+### tcp_keepalives_idle
+```
+path: general.tcp_keepalives_idle
+default: 5
+```
+
+Number of seconds of connection idleness to wait before sending a keepalive packet to the server.
+
+### tcp_keepalives_count
+```
+path: general.tcp_keepalives_count
+default: 5
+```
+
+Number of unacknowledged keepalive packets allowed before giving up and closing the connection.
+
+### tcp_keepalives_interval
+```
+path: general.tcp_keepalives_interval
+default: 5
+```
+
+Number of seconds between keepalive packets.
+
+### tls_certificate
+```
+path: general.tls_certificate
+default: <UNSET>
+example: "server.cert"
+```
+
+Path to TLS Certficate file to use for TLS connections
+
+### tls_private_key
+```
+path: general.tls_private_key
+default: <UNSET>
+example: "server.key"
+```
+
+Path to TLS private key file to use for TLS connections
+
+### admin_username
+```
+path: general.admin_username
+default: "admin_user"
+```
+
+User name to access the virtual administrative database (pgbouncer or pgcat)
+Connecting to that database allows running commands like `SHOW POOLS`, `SHOW DATABASES`, etc..
+
+### admin_password
+```
+path: general.admin_password
+default: "admin_pass"
+```
+
+Password to access the virtual administrative database
+
+## `pools.<pool_name>` Section
+
+### pool_mode
+```
+path: pools.<pool_name>.pool_mode
+default: "transaction"
+```
+
+Pool mode (see PgBouncer docs for more).
+`session` one server connection per connected client
+`transaction` one server connection per client transaction
+
+### load_balancing_mode
+```
+path: pools.<pool_name>.load_balancing_mode
+default: "random"
+```
+
+Load balancing mode
+`random` selects the server at random
+`loc` selects the server with the least outstanding busy conncetions
+
+### default_role
+```
+path: pools.<pool_name>.default_role
+default: "any"
+```
+
+If the client doesn't specify, PgCat routes traffic to this role by default.
+`any` round-robin between primary and replicas,
+`replica` round-robin between replicas only without touching the primary,
+`primary` all queries go to the primary unless otherwise specified.
+
+### query_parser_enabled
+```
+path: pools.<pool_name>.query_parser_enabled
+default: true
+```
+
+If Query Parser is enabled, we'll attempt to parse
+every incoming query to determine if it's a read or a write.
+If it's a read query, we'll direct it to a replica. Otherwise, if it's a write,
+we'll direct it to the primary.
+
+### primary_reads_enabled
+```
+path: pools.<pool_name>.primary_reads_enabled
+default: true
+```
+
+If the query parser is enabled and this setting is enabled, the primary will be part of the pool of databases used for
+load balancing of read queries. Otherwise, the primary will only be used for write
+queries. The primary can always be explicitly selected with our custom protocol.
+
+### sharding_key_regex
+```
+path: pools.<pool_name>.sharding_key_regex
+default: <UNSET>
+example: '/\* sharding_key: (\d+) \*/'
+```
+
+Allow sharding commands to be passed as statement comments instead of
+separate commands. If these are unset this functionality is disabled.
+
+### sharding_function
+```
+path: pools.<pool_name>.sharding_function
+default: "pg_bigint_hash"
+```
+
+So what if you wanted to implement a different hashing function,
+or you've already built one and you want this pooler to use it?
+Current options:
+`pg_bigint_hash`: PARTITION BY HASH (Postgres hashing function)
+`sha1`: A hashing function based on SHA1
+
+### automatic_sharding_key
+```
+path: pools.<pool_name>.automatic_sharding_key
+default: <UNSET>
+example: "data.id"
+```
+
+Automatically parse this from queries and route queries to the right shard!
+
+### idle_timeout
+```
+path: pools.<pool_name>.idle_timeout
+default: 40000
+```
+
+Idle timeout can be overwritten in the pool
+
+### connect_timeout
+```
+path: pools.<pool_name>.connect_timeout
+default: 3000
+```
+
+Connect timeout can be overwritten in the pool
+
+## `pools.<pool_name>.users.<user_index>` Section
+
+### username
+```
+path: pools.<pool_name>.users.<user_index>.username
+default: "sharding_user"
+```
+
+Postgresql username
+
+### password
+```
+path: pools.<pool_name>.users.<user_index>.password
+default: "sharding_user"
+```
+
+Postgresql password
+
+### pool_size
+```
+path: pools.<pool_name>.users.<user_index>.pool_size
+default: 9
+```
+
+Maximum number of server connections that can be established for this user
+The maximum number of connection from a single Pgcat process to any database in the cluster
+is the sum of pool_size across all users.
+
+### statement_timeout
+```
+path: pools.<pool_name>.users.<user_index>.statement_timeout
+default: 0
+```
+
+Maximum query duration. Dangerous, but protects against DBs that died in a non-obvious way.
+0 means it is disabled.
+
+## `pools.<pool_name>.shards.<shard_index>` Section
+
+### servers
+```
+path: pools.<pool_name>.shards.<shard_index>.servers
+default: [["127.0.0.1", 5432, "primary"], ["localhost", 5432, "replica"]]
+```
+
+Array of servers in the shard, each server entry is an array of `[host, port, role]`
+
+### mirrors
+```
+path: pools.<pool_name>.shards.<shard_index>.mirrors
+default: <UNSET>
+example: [["1.2.3.4", 5432, 0], ["1.2.3.4", 5432, 1]]
+```
+
+Array of mirrors for the shard, each mirror entry is an array of `[host, port, index of server in servers array]`
+Traffic hitting the server identified by the index will be sent to the mirror.
+
+### database
+```
+path: pools.<pool_name>.shards.<shard_index>.database
+default: "shard0"
+```
+
+Database name (e.g. "postgres")
+
@@ -39,35 +39,7 @@ PGPASSWORD=postgres psql -h 127.0.0.1 -p 6432 -U postgres -c 'SELECT 1'

 ### Config

-| **Name**                     | **Description**                                                                                                                            | **Examples**                     |
-|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
-| **`general`**                |                                                                                                                                            |                                  |
-| `host`                       | The pooler will run on this host, 0.0.0.0 means accessible from everywhere.                                                                | `0.0.0.0`                        |
-| `port`                       | The pooler will run on this port.                                                                                                          | `6432`                           |
-| `enable_prometheus_exporter` | Enable prometheus exporter which will export metrics in prometheus exposition format.                                                      | `true`                           |
-| `prometheus_exporter_port`   | Port at which prometheus exporter listens on.                                                                                              | `9930`                           |
-| `pool_size`                  | Maximum allowed server connections per pool. Pools are separated for each user/shard/server role. The connections are allocated as needed. | `15`                             |
-| `pool_mode`                  | The pool mode to use, i.e. `session` or `transaction`.                                                                                     | `transaction`                    |
-| `connect_timeout`            | Maximum time to establish a connection to a server (milliseconds). If reached, the server is banned and the next target is attempted.      | `5000`                           |
-| `healthcheck_timeout`        | Maximum time to pass a health check (`SELECT 1`, milliseconds). If reached, the server is banned and the next target is attempted.         | `1000`                           |
-| `shutdown_timeout`           | Maximum time to give clients during shutdown before forcibly killing client connections (ms).                                              | `60000`                          |
-| `healthcheck_delay`          | How long to keep connection available for immediate re-use, without running a healthcheck query on it                                      | `30000`                          |
-| `ban_time`                   | Ban time for a server (seconds). It won't be allowed to serve transactions until the ban expires; failover targets will be used instead.   | `60`                             |
-| `autoreload`                 | Enable auto-reload of config after fixed time-interval.                                                                                    | `false`                          |
-|                              |                                                                                                                                            |                                  |
-| **`user`**                   |                                                                                                                                            |                                  |
-| `name`                       | The user name.                                                                                                                             | `sharding_user`                  |
-| `password`                   | The user password in plaintext.                                                                                                            | `hunter2`                        |
-| `statement_timeout` | Timeout in milliseconds for how long a query takes to execute | `0 (disabled)`
-|                              |                                                                                                                                            |                                  |
-| **`shards`**                 | Shards are numerically numbered starting from 0; the order in the config is preserved by the pooler to route queries accordingly.          | `[shards.0]`                     |
-| `servers`                    | List of servers to connect to and their roles. A server is: `[host, port, role]`, where `role` is either `primary` or `replica`.           | `["127.0.0.1", 5432, "primary"]` |
-| `database`                   | The name of the database to connect to. This is the same on all servers that are part of one shard.                                        |                                  |
-|                              |                                                                                                                                            |                                  |
-| **`query_router`**           |                                                                                                                                            |                                  |
-| `default_role`               | Traffic is routed to this role by default (random), unless the client specifies otherwise. Default is `any`, for any role available.  | `any`, `primary`, `replica`      |
-| `query_parser_enabled`       | Enable the query parser which will inspect incoming queries and route them to a primary or replicas.                                       | `false`                          |
-| `primary_reads_enabled`      | Enable this to allow read queries on the primary; otherwise read queries are routed to the replicas.                                       | `true`                           |
+(See Configurations page)[https://github.com/levkk/pgcat/blob/main/CONFIG.md]

 ## Local development

@@ -18,21 +18,21 @@ enable_prometheus_exporter = true
 prometheus_exporter_port = 9930

 # How long to wait before aborting a server connection (ms).
-connect_timeout = 5000
+connect_timeout = 5000 # milliseconds

 # How long an idle connection with a server is left open (ms).
-idle_timeout = 30000
+idle_timeout = 30000 # milliseconds

 # How much time to give the health check query to return with a result (ms).
-healthcheck_timeout = 1000
+healthcheck_timeout = 1000 # milliseconds

 # How long to keep connection available for immediate re-use, without running a healthcheck query on it
-healthcheck_delay = 30000
+healthcheck_delay = 30000 # milliseconds

 # How much time to give clients during shutdown before forcibly killing client connections (ms).
-shutdown_timeout = 60000
+shutdown_timeout = 60000 # milliseconds

-# For how long to ban a server if it fails a health check (seconds).
+# How long to ban a server if it fails a health check (seconds).
 ban_time = 60 # seconds

 # If we should log client connections
@@ -41,40 +41,52 @@ log_client_connections = false
 # If we should log client disconnections
 log_client_disconnections = false

-# Reload config automatically if it changes.
+# When set to true, PgCat reloads configs if it detects a change in the config file.
 autoreload = false

 # Number of worker threads the Runtime will use (4 by default).
 worker_threads = 5

-# TLS
+# Number of seconds of connection idleness to wait before sending a keepalive packet to the server.
+tcp_keepalives_idle = 5
+# Number of unacknowledged keepalive packets allowed before giving up and closing the connection.
+tcp_keepalives_count = 5
+# Number of seconds between keepalive packets.
+tcp_keepalives_interval = 5
+
+# Path to TLS Certficate file to use for TLS connections
 # tls_certificate = "server.cert"
+# Path to TLS private key file to use for TLS connections
 # tls_private_key = "server.key"

-# Credentials to access the virtual administrative database (pgbouncer or pgcat)
+# User name to access the virtual administrative database (pgbouncer or pgcat)
 # Connecting to that database allows running commands like `SHOW POOLS`, `SHOW DATABASES`, etc..
 admin_username = "admin_user"
+# Password to access the virtual administrative database
 admin_password = "admin_pass"

-# pool
-# configs are structured as pool.<pool_name>
-# the pool_name is what clients use as database name when connecting
-# For the example below a client can connect using "postgres://sharding_user:sharding_user@pgcat_host:pgcat_port/sharded_db"
+# pool configs are structured as pool.<pool_name>
+# the pool_name is what clients use as database name when connecting.
+# For a pool named `sharded_db`, clients access that pool using connection string like
+# `postgres://sharding_user:sharding_user@pgcat_host:pgcat_port/sharded_db`
 [pools.sharded_db]
 # Pool mode (see PgBouncer docs for more).
-# session: one server connection per connected client
-# transaction: one server connection per client transaction
+# `session` one server connection per connected client
+# `transaction` one server connection per client transaction
 pool_mode = "transaction"

-# If the client doesn't specify, route traffic to
-# this role by default.
-#
-# any: round-robin between primary and replicas,
-# replica: round-robin between replicas only without touching the primary,
-# primary: all queries go to the primary unless otherwise specified.
+# Load balancing mode
+# `random` selects the server at random
+# `loc` selects the server with the least outstanding busy conncetions
+load_balancing_mode = "random"
+
+# If the client doesn't specify, PgCat routes traffic to this role by default.
+# `any` round-robin between primary and replicas,
+# `replica` round-robin between replicas only without touching the primary,
+# `primary` all queries go to the primary unless otherwise specified.
 default_role = "any"

-# Query parser. If enabled, we'll attempt to parse
+# If Query Parser is enabled, we'll attempt to parse
 # every incoming query to determine if it's a read or a write.
 # If it's a read query, we'll direct it to a replica. Otherwise, if it's a write,
 # we'll direct it to the primary.
@@ -93,23 +105,26 @@ primary_reads_enabled = true

 # So what if you wanted to implement a different hashing function,
 # or you've already built one and you want this pooler to use it?
-#
 # Current options:
-#
-# pg_bigint_hash: PARTITION BY HASH (Postgres hashing function)
-# sha1: A hashing function based on SHA1
-#
+# `pg_bigint_hash`: PARTITION BY HASH (Postgres hashing function)
+# `sha1`: A hashing function based on SHA1
 sharding_function = "pg_bigint_hash"

 # Automatically parse this from queries and route queries to the right shard!
-automatic_sharding_key = "data.id"
+# automatic_sharding_key = "data.id"

 # Idle timeout can be overwritten in the pool
 idle_timeout = 40000

-# Credentials for users that may connect to this cluster
+# Connect timeout can be overwritten in the pool
+connect_timeout = 3000
+
+# User configs are structured as pool.<pool_name>.users.<user_index>
+# This secion holds the credentials for users that may connect to this cluster
 [pools.sharded_db.users.0]
+# Postgresql username
 username = "sharding_user"
+# Postgresql password
 password = "sharding_user"
 # Maximum number of server connections that can be established for this user
 # The maximum number of connection from a single Pgcat process to any database in the cluster
@@ -117,6 +132,7 @@ password = "sharding_user"
 pool_size = 9

 # Maximum query duration. Dangerous, but protects against DBs that died in a non-obvious way.
+# 0 means it is disabled.
 statement_timeout = 0

 [pools.sharded_db.users.1]
@@ -125,28 +141,26 @@ password = "other_user"
 pool_size = 21
 statement_timeout = 15000

-# Shard 0
+# Shard configs are structured as pool.<pool_name>.shards.<shard_id>
+# Each shard config contains a list of servers that make up the shard
+# and the database name to use.
 [pools.sharded_db.shards.0]
-# [ host, port, role ]
-servers = [
-    [ "127.0.0.1", 5432, "primary" ],
-    [ "localhost", 5432, "replica" ]
-]
+# Array of servers in the shard, each server entry is an array of `[host, port, role]`
+servers = [["127.0.0.1", 5432, "primary"], ["localhost", 5432, "replica"]]
+
+# Array of mirrors for the shard, each mirror entry is an array of `[host, port, index of server in servers array]`
+# Traffic hitting the server identified by the index will be sent to the mirror.
+# mirrors = [["1.2.3.4", 5432, 0], ["1.2.3.4", 5432, 1]]
+
 # Database name (e.g. "postgres")
 database = "shard0"

 [pools.sharded_db.shards.1]
-servers = [
-    [ "127.0.0.1", 5432, "primary" ],
-    [ "localhost", 5432, "replica" ],
-]
+servers = [["127.0.0.1", 5432, "primary"], ["localhost", 5432, "replica"]]
 database = "shard1"

 [pools.sharded_db.shards.2]
-servers = [
-    [ "127.0.0.1", 5432, "primary" ],
-    [ "localhost", 5432, "replica" ],
-]
+servers = [["127.0.0.1", 5432, "primary" ], ["localhost", 5432, "replica" ]]
 database = "shard2"


@@ -0,0 +1,92 @@
+import re
+import tomli
+
+class DocGenerator:
+    def __init__(self, filename):
+        self.doc = []
+        self.current_section = ""
+        self.current_comment = []
+        self.current_field_name = ""
+        self.current_field_value = []
+        self.current_field_unset = False
+        self.filename = filename
+
+    def write(self):
+        with open("../CONFIG.md", "w") as text_file:
+            text_file.write("# PgCat Configurations \n")
+            for entry in self.doc:
+                if entry["name"] == "__section__":
+                    text_file.write("## `" + entry["section"] + "` Section" + "\n")
+                    text_file.write("\n")
+                    continue
+                text_file.write("### " + entry["name"]+ "\n")
+                text_file.write("```"+ "\n")
+                text_file.write("path: " + entry["fqdn"]+ "\n")
+                text_file.write("default: " + entry["defaults"].strip()+ "\n")
+                if entry["example"] is not None:
+                    text_file.write("example: " + entry["example"].strip()+ "\n")
+                text_file.write("```"+ "\n")
+                text_file.write("\n")
+                text_file.write(entry["comment"]+ "\n")
+                text_file.write("\n")
+
+    def save_entry(self):
+        if len(self.current_field_name) == 0:
+            return
+        if len(self.current_comment) == 0:
+            return
+        self.current_section = self.current_section.replace("sharded_db", "<pool_name>")
+        self.current_section = self.current_section.replace("simple_db", "<pool_name>")
+        self.current_section = self.current_section.replace("users.0", "users.<user_index>")
+        self.current_section = self.current_section.replace("users.1", "users.<user_index>")
+        self.current_section = self.current_section.replace("shards.0", "shards.<shard_index>")
+        self.current_section = self.current_section.replace("shards.1", "shards.<shard_index>")
+        self.doc.append(
+            {
+                "name": self.current_field_name,
+                "fqdn": self.current_section + "." + self.current_field_name,
+                "section": self.current_section,
+                "comment": "\n".join(self.current_comment),
+                "defaults": self.current_field_value if not self.current_field_unset else "<UNSET>",
+                "example": self.current_field_value  if self.current_field_unset  else None
+            }
+        )
+        self.current_comment = []
+        self.current_field_name = ""
+        self.current_field_value = []
+    def parse(self):
+        with open("../pgcat.toml", "r") as f:
+            for line in f.readlines():
+                line = line.strip()
+                if len(line) == 0:
+                    self.save_entry()
+
+                if line.startswith("["):
+                    self.current_section = line[1:-1]
+                    self.current_field_name = "__section__"
+                    self.current_field_unset = False
+                    self.save_entry()
+
+                elif line.startswith("#"):
+                    results = re.search("^#\s*([A-Za-z0-9_]+)\s*=(.+)$", line)
+                    if results is not None:
+                        self.current_field_name = results.group(1)
+                        self.current_field_value = results.group(2)
+                        self.current_field_unset = True
+                        self.save_entry()
+                    else:
+                        self.current_comment.append(line[1:].strip())
+                else:
+                    results = re.search("^\s*([A-Za-z0-9_]+)\s*=(.+)$", line)
+                    if results is None:
+                        continue
+                    self.current_field_name = results.group(1)
+                    self.current_field_value = results.group(2)
+                    self.current_field_unset = False
+                    self.save_entry()
+        self.save_entry()
+        return self
+
+
+DocGenerator("../pgcat.toml").parse().write()
+