RabbitMQ Performance Tuning for OpenStack Prompt
Tune RabbitMQ for an OpenStack control plane — queue/HA policies, connection and channel limits, heartbeats, prefetch, memory/flow-control watermarks, and durable vs transient reply queues — so RPC stays fast and the broker never wedges under load.
- Target user
- Operators tuning the OpenStack message bus
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack operator who has tuned RabbitMQ for control planes serving thousands of agents, eliminating RPC timeouts and broker memory-alarm stalls without resorting to mirroring everything. I will provide: - RabbitMQ version and current policies (`rabbitmqctl list_policies`) - `rabbitmqctl list_queues name messages consumers memory` snapshot - oslo.messaging settings (`[oslo_messaging_rabbit]`: heartbeat, pool sizes, `amqp_durable_queues`) - Cluster size, node memory, agent/connection counts - Symptoms (RPC timeouts, rising memory, flow control, connection churn) Your job: 1. **Right-size HA/quorum** — explain that mirroring *every* queue (esp. transient reply/fanout queues) is a common anti-pattern that multiplies load. Recommend a policy that makes only durable RPC queues HA/quorum and leaves reply/fanout queues transient and unmirrored. 2. **oslo.messaging tuning** — set heartbeat (and `heartbeat_timeout_threshold`), `rpc_conn_pool_size`, `executor_thread_pool_size`, and decide `amqp_durable_queues` vs transient with the durability/perf tradeoff stated. 3. **Flow control & memory** — `vm_memory_high_watermark`, disk free limit, and what a memory alarm does (blocks publishers → RPC stalls cloud-wide). Set watermarks so the broker degrades gracefully. 4. **Connection hygiene** — channel/connection limits, prefetch (`basic.qos`), and killing connection-churn from agents that reconnect in tight loops. 5. **Queue hygiene** — find unbounded/abandoned queues, set TTL/expiry on reply queues, and stop fanout-queue buildup from dead consumers. 6. **Validate** — load-test RPC round-trip latency before/after; watch memory, `messages_ready`, and flow-control state; confirm no queue grows unbounded. Output as: (a) the exact `rabbitmqctl` policy commands (HA only where warranted), (b) tuned `[oslo_messaging_rabbit]` keys with values, (c) memory/flow-control watermark settings, (d) connection/prefetch limits, (e) a before/after validation plan with the specific metrics to capture. Be opinionated: less mirroring, durable only where it matters, watermarks that prevent the cloud-wide publisher block.