RabbitMQ Queue Backpressure & Flow-Control Triage Prompt

Diagnose why a RabbitMQ queue is backing up and producers are being throttled, and decide whether the bottleneck is slow consumers, flow control, or a resource alarm.

Target user

Platform and SRE engineers triaging RabbitMQ throughput incidents

Difficulty

Advanced

Tools

Claude, ChatGPT, Cursor

You are a senior platform engineer who has triaged many RabbitMQ backpressure incidents where queues grow without bound and publishers stall. Walk me through diagnosing mine. I will provide: - `rabbitmqctl list_queues name messages messages_ready messages_unacknowledged consumers consumer_utilisation` [PASTE OUTPUT] - Connection state showing flow control: `rabbitmqctl list_connections name state` and channel `list_channels` [PASTE OUTPUT] - Any resource alarms: `rabbitmqctl status` memory/disk alarm section, and `list_queues memory` [PASTE OUTPUT] - Symptoms: publishers slow/blocked, growing queue depth, rising latency [DESCRIBE] Your job: 1. **Locate the bottleneck** — separate "queue growing because consumers are slow/absent" (high `messages_ready`, low `consumer_utilisation`) from "broker is throttling producers" (connections in `flow` state) from "a memory or disk alarm has blocked all publishers." 2. **Read the signals correctly** — explain `messages_ready` vs `messages_unacknowledged` (unacked = consumers holding too much via prefetch), `consumer_utilisation` near 1.0 meaning consumers are the limit, and connection `flow` state meaning internal credit-based flow control is engaged. 3. **Trace causes** — slow downstream dependency, too few consumers, prefetch too low (consumers idle waiting) or too high (one consumer hoards), large unacked backlog from a stuck consumer, or memory/disk watermark crossed. 4. **Recommend fixes** — scale or speed consumers, tune prefetch/QoS, add a lazy queue or set a max-length with overflow policy, fix the resource alarm, or apply backpressure deliberately at the producer with publisher confirms. 5. **Prevent recurrence** — what to alert on (queue depth trend, `messages_unacknowledged`, connections in flow, alarm state) so this is caught before publishers block. Output as: (a) the diagnosed bottleneck with the specific metric that proves it, (b) immediate mitigation, (c) root-cause fix, (d) the alerts to add. Validate any queue-policy or prefetch change on a staging broker before prod. Do not purge a backed-up queue to "relieve pressure" without review — purging discards real messages and hides the actual cause.

Why this prompt works

Backpressure incidents are confusing because three different mechanisms produce similar symptoms: slow consumers, RabbitMQ’s internal credit-based flow control, and resource alarms that block publishers outright. The prompt forces you to distinguish them using the exact metrics that tell them apart — messages_ready versus messages_unacknowledged, consumer_utilisation, and connection flow state — rather than guessing. That distinction changes the fix entirely: scaling consumers does nothing if the real problem is a disk-free alarm that has blocked every publisher.

It encodes the right mental model of unacked messages. A large messages_unacknowledged count usually means consumers have pulled work via prefetch but aren’t acking it — either because they’re slow, stuck, or prefetch is set too high and one consumer is hoarding. Reading that signal correctly is what separates a five-minute fix from an hour of restarting random services.

The guardrails address the most common harmful reflex during a backpressure incident: purging the queue to make the number go down. That destroys real messages and erases the evidence of what caused the backup. By steering toward staging validation, deliberate producer-side backpressure with publisher confirms, and the right alerts, the prompt turns a panic into a diagnosis.

RabbitMQ Queue Backpressure & Flow-Control Triage Prompt

Why this prompt works

Related prompts

RabbitMQ Memory & Disk Alarm Resource-Limit Triage Prompt

Why this prompt works

Related prompts

RabbitMQ Memory & Disk Alarm Resource-Limit Triage Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet