Fix Back-and-forth RELOAD Bug (#330)

We identified a bug where RELOAD fails to update the pools. To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change. This is because we use a HashSet to keep track of config hashes but we never remove values from it. Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools" We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
2026-03-23 01:16:30 +00:00 · 2023-02-21 21:53:10 -06:00
parent 37e1c5297a
commit 75a7d4409a
5 changed files with 75 additions and 17 deletions
--- a/tests/ruby/misc_spec.rb
+++ b/tests/ruby/misc_spec.rb
@@ -8,6 +8,55 @@ describe "Miscellaneous" do
    processes.pgcat.shutdown
  end

+  context "when adding then removing instance using RELOAD" do
+    it "works correctly" do
+      admin_conn = PG::connect(processes.pgcat.admin_connection_string)
+
+      current_configs = processes.pgcat.current_config
+      correct_count = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].count
+      expect(admin_conn.async_exec("SHOW DATABASES").count).to eq(correct_count)
+
+      extra_replica = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].last.clone
+      extra_replica[0] = "127.0.0.1"
+      current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"] << extra_replica
+
+      processes.pgcat.update_config(current_configs) # with replica added
+      processes.pgcat.reload_config
+      correct_count = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].count
+      expect(admin_conn.async_exec("SHOW DATABASES").count).to eq(correct_count)
+
+      current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].pop
+
+      processes.pgcat.update_config(current_configs) # with replica removed again
+      processes.pgcat.reload_config
+      correct_count = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].count
+      expect(admin_conn.async_exec("SHOW DATABASES").count).to eq(correct_count)
+    end
+  end
+
+  context "when removing then adding instance back using RELOAD" do
+    it "works correctly" do
+      admin_conn = PG::connect(processes.pgcat.admin_connection_string)
+
+      current_configs = processes.pgcat.current_config
+      correct_count = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].count
+      expect(admin_conn.async_exec("SHOW DATABASES").count).to eq(correct_count)
+
+      removed_replica = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].pop
+      processes.pgcat.update_config(current_configs) # with replica removed
+      processes.pgcat.reload_config
+      correct_count = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].count
+      expect(admin_conn.async_exec("SHOW DATABASES").count).to eq(correct_count)
+
+      current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"] << removed_replica
+
+      processes.pgcat.update_config(current_configs) # with replica added again
+      processes.pgcat.reload_config
+      correct_count = current_configs["pools"]["sharded_db"]["shards"]["0"]["servers"].count
+      expect(admin_conn.async_exec("SHOW DATABASES").count).to eq(correct_count)
+    end
+  end
+
  describe "TCP Keepalives" do
    # Ideally, we should block TCP traffic to the database using
    # iptables to mimic passive (connection is dropped without a RST packet)