RabbitMQ Connection & Channel Leak Debugging Prompt
Track down why RabbitMQ connection or channel counts keep climbing until the broker hits limits, and find the client code that opens but never closes them.
- Target user
- Backend and platform engineers debugging RabbitMQ client resource leaks
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior platform engineer who has chased down RabbitMQ connection and channel leaks that slowly exhausted broker limits. Help me find mine. I will provide: - `rabbitmqctl list_connections name user peer_host channels state` [PASTE OUTPUT] - `rabbitmqctl list_channels connection number consumer_count messages_unacknowledged` [PASTE OUTPUT] - The growth pattern over time (steady climb, climbs under load, never drops after deploys) [DESCRIBE] - The client library/framework and how the app creates connections and channels [DESCRIBE] Your job: 1. **Confirm a leak vs legitimate growth** — distinguish a steadily climbing count that never falls (leak) from expected per-load scaling. Identify which `peer_host`/`user`/app is accumulating connections or channels. 2. **Find the anti-pattern** — the usual culprits: opening a new connection per message or per request instead of long-lived connections, opening a channel per publish without closing it, not closing channels/connections on error paths, or recreating connections on every retry. 3. **Recommend the correct model** — one (or a small pool of) long-lived connections per process, channels scoped to a unit of work and closed in a finally/using block, a publisher channel separate from consumer channels, and reconnection logic that reuses rather than multiplies. 4. **Check broker-side limits** — `channel_max`, connection limits, and file-descriptor headroom; explain what happens when they're hit (new connections refused, broker instability). 5. **Add guardrails** — alert on connection/channel count trend and per-connection channel count, and add client-side metrics so a leak is caught early. Output as: (a) the leaking source identified from the listings, (b) the specific anti-pattern, (c) the corrected connection/channel lifecycle, (d) the limits and alerts to add. Reproduce and confirm the fix on a staging broker before deploying. Do not force-close connections on a prod broker to clear the count without identifying the source — you will drop in-flight messages and the leak will simply refill.
Why this prompt works
Connection and channel leaks are slow-motion outages: the count creeps up over hours or days until the broker refuses new connections or destabilizes, often right after a deploy makes it worse. The prompt starts by separating a genuine leak — a count that climbs and never falls — from legitimate per-load scaling, and uses list_connections and list_channels to pin the leak to a specific host, user, or app rather than guessing.
It targets the handful of client anti-patterns that cause nearly every leak: opening a connection per request, a channel per publish, or failing to close resources on error paths. Channels are cheap but not free, and connections are genuinely expensive, so tying either to a single message both leaks and destroys throughput. The prompt prescribes the correct lifecycle — long-lived pooled connections, work-scoped channels closed in a finally block, reconnection that reuses rather than multiplies — which is the actual fix.
The guardrails stop the tempting but harmful reaction of force-closing connections on the broker to make the number drop. That discards in-flight messages and the count simply refills because the buggy client is still running. By insisting on finding the source, checking channel_max and FD headroom, and adding trend alerts, the prompt turns a recurring mystery into a one-time fix with early warning.