Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for RabbitMQ By James Joyner IV · · 9 min read

RabbitMQ Error Guide: 'flow' Connection State Internal Flow Control

Fix RabbitMQ publishers stuck in 'flow' state: diagnose internal credit-based flow control from slow queues, disk, or CPU, distinct from resource alarms.

  • #rabbitmq
  • #troubleshooting
  • #errors
  • #flow-control

Exact Error Message

There is no AMQP exception for this condition. Instead the connection or channel is reported in the flow state, and the management UI shows a yellow “flow” badge. You see it in rabbitmqctl output and in client metrics:

Listing connections ...
name                              state
10.0.5.31:51544 -> 10.0.4.21:5672 flow
10.0.5.31:51602 -> 10.0.4.21:5672 running

Listing channels ...
pid             connection                          state
<rabbit@mq-01.3.812.0>  10.0.5.31:51544 -> 10.0.4.21:5672  flow

Client libraries expose it differently: the Java client fires a BlockedListener-adjacent metric, pika exposes throttling via the Connection.Blocked distinction, and most clients simply show basic.publish calls taking longer and longer to return as TCP back-pressure builds.

What the Error Means

flow is RabbitMQ’s internal credit-based flow control, and it is not the same as a resource alarm or connection.blocked. Every stage of the publish pipeline — the connection reader, the channel process, the exchange, and the queue process — hands a fixed number of message credits to the stage upstream of it. When a downstream stage (typically a queue) cannot keep up, it stops granting credit, the stage above it runs out, and so on back up to the connection reader, which then stops reading from the TCP socket.

The result: the slowest queue throttles the connection that feeds it. A connection in flow is being deliberately rate-limited so a fast publisher cannot overwhelm a slow queue. Unlike a resource alarm, this is per-connection and continuous — there is no broker-wide block and no entry under Alarms.

Common Causes

  • A slow or CPU-bound queue process. Classic queues are single Erlang processes; a queue doing heavy work (indexing, large message bodies, lazy-to-disk paging) cannot accept credit fast enough.
  • Disk I/O saturation on persistent messages. Persistent messages on a slow disk make the message store the bottleneck, throttling every publisher writing durable messages.
  • A publisher genuinely faster than any consumer. If you publish faster than the queue can durably accept and downstream can drain, flow control is the broker protecting itself short of an alarm.
  • High vm_memory_high_watermark_paging_ratio paging activity. When a queue pages messages to disk, its process spends time on I/O instead of accepting new credit.
  • CPU starvation / too few schedulers. An overloaded node where Erlang schedulers are saturated cannot run queue processes promptly, so credit is granted slowly.
  • Quorum queue followers lagging. A quorum queue that cannot replicate to a majority quickly applies back-pressure to the leader, which flows back to publishers.

How to Reproduce the Error

Publish persistent messages to a single classic queue with no consumer attached, faster than the disk can absorb:

# pseudo-load: one publisher, durable queue, no consumer
queue.declare(durable=true)
loop:
  basic.publish(routing_key=q, body=64KB, delivery_mode=2)

Within seconds the connection reader runs out of credit and the connection enters flow. Add a slow disk (or delivery_mode=2 on a network volume) and the effect appears almost immediately. Crucially, rabbitmqctl status shows no alarm — only the connection state flips to flow.

Diagnostic Commands

# Which connections are in flow control right now?
rabbitmqctl list_connections name state | grep -i flow

# Which channels are in flow, and on which connection?
rabbitmqctl list_channels pid connection state | grep -i flow

# Confirm this is NOT a resource alarm (flow is independent of alarms)
rabbitmqctl status | grep -iA4 'Alarms'

# Find the bottleneck queue: high message rate, growing depth
rabbitmqctl list_queues name messages messages_ready \
  message_bytes_persistent --sort=messages | tail -10

# Per-queue consumer count to spot under-consumed queues
rabbitmqctl list_queues name messages consumers messages_unacknowledged \
  --sort=messages | tail -10

# Erlang run-queue / scheduler pressure on the node
rabbitmq-diagnostics runtime_thread_stats 2>/dev/null | head -20

# Disk and I/O context for the data directory
df -h $(rabbitmqctl eval 'rabbit_mnesia:dir().' | tr -d '"')

The tell is a connection in flow while Alarms is (none). Cross-reference the busiest queue from list_queues — that queue’s process is almost always the stage refusing to grant credit.

Step-by-Step Resolution

  1. Confirm it is flow control, not an alarm. Run rabbitmqctl status | grep -iA4 Alarms. If it shows (none) but connections are flow, this is internal flow control and the fix is to speed up the downstream stage, not to free a resource.

  2. Identify the bottleneck queue. The connection in flow feeds one or more queues. Use list_queues sorted by messages and consumers to find the queue with growing depth and too few (or zero) consumers.

  3. Scale or restore consumers. If the queue is under-consumed, add consumer instances or raise consumer concurrency so the queue process drains and can grant credit again.

  4. Reduce per-message cost. Large message bodies and delivery_mode=2 (persistent) on slow disk are the usual culprits. Move large payloads to object storage and pass a reference, or move the data volume to faster storage (local NVMe over network disk).

  5. Spread load across queues. A single classic queue is one process and one CPU core. Shard high-throughput traffic across multiple queues (or use a sharding plugin / multiple quorum queues) so no single process is the bottleneck.

  6. Right-size the node. If runtime_thread_stats shows saturated schedulers, the node is CPU-starved. Add vCPUs or move queues to additional nodes.

  7. Verify recovery. Re-run rabbitmqctl list_connections name state | grep -i flow. An empty result means publishers are no longer throttled.

Prevention and Best Practices

  • Treat sustained flow as a capacity signal, not a transient: alert when any connection stays in flow for more than a few minutes.
  • Keep message bodies small; store large blobs externally and publish references.
  • Use faster, local disk for persistent workloads and benchmark fsync latency before going to production.
  • Shard hot traffic across multiple queues so one queue process is never the systemic bottleneck.
  • Monitor per-queue depth and consumer count so an under-consumed queue is caught before it throttles its publishers.
  • Size nodes for peak publish rate and keep Erlang scheduler utilization with headroom.
  • resource alarm (memory/disk): a broker-wide block under Alarms, distinct from per-connection flow. See the RabbitMQ resource alarm guide.
  • connection.blocked: an explicit AMQP notification raised by a resource alarm, not by flow control.
  • missed heartbeats / timeout: a saturated node in flow can also miss heartbeats if it is severely overloaded.
  • publisher nack received: a persistent-message bottleneck under flow can escalate into nacks if the queue cannot persist at all.

Frequently Asked Questions

Is flow an error I need to fix immediately? Not necessarily. Brief flow control under burst load is normal and self-correcting. Sustained flow means a queue or the node cannot keep up and needs attention.

How is flow different from blocked/blocking? blocked/blocking come from a broker-wide resource alarm and appear under Alarms. flow is per-connection internal credit control with no alarm and no client-visible connection.blocked notification.

Will my client get an exception? No. The client simply sees basic.publish slow down as TCP back-pressure builds. There is no channel close and no AMQP error code.

Does disabling publisher confirms help? No — confirms are unrelated. Flow control throttles the byte stream regardless of confirm mode. Fix the downstream bottleneck instead.

Can quorum queues cause flow control? Yes. If a quorum queue cannot replicate to a majority quickly, the leader applies back-pressure that surfaces as flow on the publishing connection.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.