Files
Mostafa 3349cecc18 Add checkout_failure_limit config/feature (#911)
In a high availability deployment of PgCat, it is possible that a client may land on a container of PgCat that is very busy with clients and as such the new client might be perpetually stuck in checkout failure loop because all connections are used by other clients. This is specially true in session mode pools with long-lived client connections (e.g. FDW connections).

One way to fix this issue is to close client connections after they encounter some number of checkout failure. This will force the client to hit the Network load balancer again, land on a different process/container, try to checkout a connection on the new process/container. if it fails, it is disconnected and tries with another one.

This mechanism is guaranteed to eventually land on a balanced state where all clients are able to find connections provided that the overall number of connections across all containers matches the number of clients.

I was able to reproduce this issue in a control environment and was able to show this PR is able to fix it.
2025-02-27 13:17:00 -06:00
..
2023-03-28 17:19:37 +02:00
2022-08-30 09:14:53 -07:00
2024-09-13 20:02:38 -05:00
2023-05-03 09:13:05 -07:00
2022-08-30 09:14:53 -07:00
2022-08-30 09:14:53 -07:00