Tuning Galera Flow Control for OpenStack Databases

When an OpenStack control plane mysteriously freezes — instances stuck in BUILD, the API returning timeouts, but no service obviously down — my first stop is Galera flow control. The three-node MariaDB Galera cluster underneath Keystone, Nova, Neutron, and the rest is synchronous, and synchronous replication has a self-defense mechanism: if one node falls behind applying writes, it tells the whole cluster to pause until it catches up. That pause looks, from OpenStack’s side, exactly like the database hanging. Here’s how I read and tune Galera flow control for OpenStack’s specific write patterns, and how AI helps me spot a stalling node before it freezes the cluster.

Why Galera and OpenStack Are an Awkward Fit

OpenStack assumes a single logical database and writes to it constantly — every instance state change, every port update, every token validation that touches the DB. Galera gives you that single logical database across three nodes with synchronous replication, which is great for HA. The catch is that synchronous means the cluster moves at the speed of its slowest node. If one node’s apply queue backs up — slow disk, a long-running query, a GC pause — Galera throttles the entire cluster to let it catch up. That’s flow control, and it’s the mechanism most OpenStack operators have never looked at until it bites.

Check whether it’s happening right now:

SHOW GLOBAL STATUS LIKE 'wsrep_flow_control_paused';
SHOW GLOBAL STATUS LIKE 'wsrep_flow_control_sent';
SHOW GLOBAL STATUS LIKE 'wsrep_local_recv_queue_avg';

wsrep_flow_control_paused is the fraction of time the node spent paused since the counter reset — anything meaningfully above zero means replication is throttling your control plane. The openstack category collects the related database playbooks.

Reading the Receive Queue

The leading indicator of trouble is wsrep_local_recv_queue_avg. It’s the average number of write-sets queued waiting to be applied on this node. A healthy node keeps this near zero; a node that’s falling behind shows it climbing, and when it crosses the flow-control threshold, the node sends a pause to the cluster.

SHOW GLOBAL STATUS LIKE 'wsrep_local_recv_queue%';
SHOW GLOBAL STATUS LIKE 'wsrep_cert_deps_distance';

wsrep_cert_deps_distance tells you how much parallelism is available in applying write-sets — it informs how high you can safely set wsrep_slave_threads to drain the queue faster.

Prompt: “Here are wsrep_flow_control_paused, wsrep_flow_control_sent, wsrep_local_recv_queue_avg, and wsrep_cert_deps_distance from all three Galera nodes, plus my current wsrep_slave_threads and disk type per node. Identify which node is triggering flow control, whether it’s apply-bound or disk-bound, and whether raising slave_threads would help given the cert_deps_distance. Show the per-node comparison as a table. Don’t tell me to change live config — propose values for me to review.”

Output: A table showing node-2 with a recv queue 8x the others and wsrep_flow_control_sent almost entirely originating there, traced to its slower disk. It noted cert_deps_distance was high enough that raising wsrep_slave_threads from 1 to 4 on node-2 would likely help drain the queue, while warning that the real fix was matching node-2’s disk to its peers.

That per-node comparison is exactly where AI saves time — it reads four counters across three nodes and points at the culprit faster than I’d eyeball it. The model is a fast junior engineer; I confirmed node-2’s disk was genuinely slower with iostat before changing wsrep_slave_threads, because tuning the symptom without confirming the cause just moves the stall around.

Tuning the Knobs That Matter

A few settings directly affect flow control behavior:

[galera]
wsrep_slave_threads = 4
wsrep_provider_options = "gcs.fc_limit=64;gcs.fc_factor=0.8"
innodb_flush_log_at_trx_commit = 2
innodb_buffer_pool_size = 12G

wsrep_slave_threads lets a node apply write-sets in parallel (bounded by cert_deps_distance). The gcs.fc_limit raises how many write-sets can queue before flow control kicks in — raising it gives bursts more headroom but increases how far a node can lag. innodb_flush_log_at_trx_commit=2 trades a little durability for much faster commits, which for an OpenStack control-plane DB is often the right call given the data is reconstructible.

Pro Tip: Don’t just raise gcs.fc_limit to make flow-control pauses disappear. That hides the lag rather than fixing it, and a node that’s allowed to fall far behind takes a long, painful catch-up if it ever needs an SST. Fix the slow node first; tune the limit second.

Watching for the Real Culprits

OpenStack creates two write patterns that stress Galera. The first is hot, tiny, frequent writes (Nova instance state, Neutron port updates) — these benefit from parallel apply threads. The second is the occasional large transaction or schema migration during an upgrade, which can blow past the flow-control limit and pause everything. When I’m chasing an intermittent freeze, I’ll hand the Galera status counters and the timing of the OpenStack symptom to Claude and ask it to correlate the flow-control pauses with what the control plane was doing. That correlation is a strong lead; I verify it against the actual slow-query log and iostat before tuning. Reusable database prompts live in the prompt workspace.

Galera tuning is iterative and easy to overshoot. Change one parameter on one node, reset the status counters, and watch wsrep_flow_control_paused and the recv queue under real OpenStack load:

FLUSH STATUS;   -- reset counters, then observe under load
SHOW GLOBAL STATUS LIKE 'wsrep_flow_control_paused';

If the pause fraction drops and the recv queue stays low, you helped. If not, you learned that on one node. Never tune all three nodes at once — you lose the ability to compare.

Conclusion

Galera flow control is the invisible hand that freezes an OpenStack control plane when one database node falls behind, and most operators don’t look at it until it’s already biting. The diagnostic discipline is to find the lagging node from the recv queue and flow-control counters, confirm whether it’s disk-bound or apply-bound, and fix the cause before reaching for gcs.fc_limit. AI is genuinely fast at the multi-node counter comparison and at correlating pauses with control-plane activity — both real time-savers. But every conclusion gets verified with iostat and the slow-query log before you tune, and every change gets watched on one node under load. The model reads the counters; you confirm the cause and tune deliberately. More database prompts are in the prompts library.