RabbitMQ Error Guide: 'flow' Connection State Internal Flow Control
Fix RabbitMQ publishers stuck in 'flow' state: diagnose internal credit-based flow control from slow queues, disk, or CPU, distinct from resource alarms.
- #rabbitmq
- #troubleshooting
- #errors
- #flow-control
Exact Error Message
There is no AMQP exception for this condition. Instead the connection or channel is reported in the flow state, and the management UI shows a yellow “flow” badge. You see it in rabbitmqctl output and in client metrics:
Listing connections ...
name state
10.0.5.31:51544 -> 10.0.4.21:5672 flow
10.0.5.31:51602 -> 10.0.4.21:5672 running
Listing channels ...
pid connection state
<rabbit@mq-01.3.812.0> 10.0.5.31:51544 -> 10.0.4.21:5672 flow
Client libraries expose it differently: the Java client fires a BlockedListener-adjacent metric, pika exposes throttling via the Connection.Blocked distinction, and most clients simply show basic.publish calls taking longer and longer to return as TCP back-pressure builds.
What the Error Means
flow is RabbitMQ’s internal credit-based flow control, and it is not the same as a resource alarm or connection.blocked. Every stage of the publish pipeline — the connection reader, the channel process, the exchange, and the queue process — hands a fixed number of message credits to the stage upstream of it. When a downstream stage (typically a queue) cannot keep up, it stops granting credit, the stage above it runs out, and so on back up to the connection reader, which then stops reading from the TCP socket.
The result: the slowest queue throttles the connection that feeds it. A connection in flow is being deliberately rate-limited so a fast publisher cannot overwhelm a slow queue. Unlike a resource alarm, this is per-connection and continuous — there is no broker-wide block and no entry under Alarms.
Common Causes
- A slow or CPU-bound queue process. Classic queues are single Erlang processes; a queue doing heavy work (indexing, large message bodies, lazy-to-disk paging) cannot accept credit fast enough.
- Disk I/O saturation on persistent messages. Persistent messages on a slow disk make the message store the bottleneck, throttling every publisher writing durable messages.
- A publisher genuinely faster than any consumer. If you publish faster than the queue can durably accept and downstream can drain, flow control is the broker protecting itself short of an alarm.
- High
vm_memory_high_watermark_paging_ratiopaging activity. When a queue pages messages to disk, its process spends time on I/O instead of accepting new credit. - CPU starvation / too few schedulers. An overloaded node where Erlang schedulers are saturated cannot run queue processes promptly, so credit is granted slowly.
- Quorum queue followers lagging. A quorum queue that cannot replicate to a majority quickly applies back-pressure to the leader, which flows back to publishers.
How to Reproduce the Error
Publish persistent messages to a single classic queue with no consumer attached, faster than the disk can absorb:
# pseudo-load: one publisher, durable queue, no consumer
queue.declare(durable=true)
loop:
basic.publish(routing_key=q, body=64KB, delivery_mode=2)
Within seconds the connection reader runs out of credit and the connection enters flow. Add a slow disk (or delivery_mode=2 on a network volume) and the effect appears almost immediately. Crucially, rabbitmqctl status shows no alarm — only the connection state flips to flow.
Diagnostic Commands
# Which connections are in flow control right now?
rabbitmqctl list_connections name state | grep -i flow
# Which channels are in flow, and on which connection?
rabbitmqctl list_channels pid connection state | grep -i flow
# Confirm this is NOT a resource alarm (flow is independent of alarms)
rabbitmqctl status | grep -iA4 'Alarms'
# Find the bottleneck queue: high message rate, growing depth
rabbitmqctl list_queues name messages messages_ready \
message_bytes_persistent --sort=messages | tail -10
# Per-queue consumer count to spot under-consumed queues
rabbitmqctl list_queues name messages consumers messages_unacknowledged \
--sort=messages | tail -10
# Erlang run-queue / scheduler pressure on the node
rabbitmq-diagnostics runtime_thread_stats 2>/dev/null | head -20
# Disk and I/O context for the data directory
df -h $(rabbitmqctl eval 'rabbit_mnesia:dir().' | tr -d '"')
The tell is a connection in flow while Alarms is (none). Cross-reference the busiest queue from list_queues — that queue’s process is almost always the stage refusing to grant credit.
Step-by-Step Resolution
-
Confirm it is flow control, not an alarm. Run
rabbitmqctl status | grep -iA4 Alarms. If it shows(none)but connections areflow, this is internal flow control and the fix is to speed up the downstream stage, not to free a resource. -
Identify the bottleneck queue. The connection in
flowfeeds one or more queues. Uselist_queuessorted bymessagesandconsumersto find the queue with growing depth and too few (or zero) consumers. -
Scale or restore consumers. If the queue is under-consumed, add consumer instances or raise consumer concurrency so the queue process drains and can grant credit again.
-
Reduce per-message cost. Large message bodies and
delivery_mode=2(persistent) on slow disk are the usual culprits. Move large payloads to object storage and pass a reference, or move the data volume to faster storage (local NVMe over network disk). -
Spread load across queues. A single classic queue is one process and one CPU core. Shard high-throughput traffic across multiple queues (or use a sharding plugin / multiple quorum queues) so no single process is the bottleneck.
-
Right-size the node. If
runtime_thread_statsshows saturated schedulers, the node is CPU-starved. Add vCPUs or move queues to additional nodes. -
Verify recovery. Re-run
rabbitmqctl list_connections name state | grep -i flow. An empty result means publishers are no longer throttled.
Prevention and Best Practices
- Treat sustained
flowas a capacity signal, not a transient: alert when any connection stays inflowfor more than a few minutes. - Keep message bodies small; store large blobs externally and publish references.
- Use faster, local disk for persistent workloads and benchmark fsync latency before going to production.
- Shard hot traffic across multiple queues so one queue process is never the systemic bottleneck.
- Monitor per-queue depth and consumer count so an under-consumed queue is caught before it throttles its publishers.
- Size nodes for peak publish rate and keep Erlang scheduler utilization with headroom.
Related Errors
- resource alarm (memory/disk): a broker-wide block under
Alarms, distinct from per-connectionflow. See the RabbitMQ resource alarm guide. - connection.blocked: an explicit AMQP notification raised by a resource alarm, not by flow control.
- missed heartbeats / timeout: a saturated node in
flowcan also miss heartbeats if it is severely overloaded. - publisher nack received: a persistent-message bottleneck under flow can escalate into nacks if the queue cannot persist at all.
Frequently Asked Questions
Is flow an error I need to fix immediately?
Not necessarily. Brief flow control under burst load is normal and self-correcting. Sustained flow means a queue or the node cannot keep up and needs attention.
How is flow different from blocked/blocking?
blocked/blocking come from a broker-wide resource alarm and appear under Alarms. flow is per-connection internal credit control with no alarm and no client-visible connection.blocked notification.
Will my client get an exception?
No. The client simply sees basic.publish slow down as TCP back-pressure builds. There is no channel close and no AMQP error code.
Does disabling publisher confirms help? No — confirms are unrelated. Flow control throttles the byte stream regardless of confirm mode. Fix the downstream bottleneck instead.
Can quorum queues cause flow control?
Yes. If a quorum queue cannot replicate to a majority quickly, the leader applies back-pressure that surfaces as flow on the publishing connection.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.