RabbitMQ Error Guide: '{socket_error, epipe}' Broken Pipe on

Exact Error Message

A broken-pipe error means RabbitMQ (or a client) tried to write to a TCP socket whose other end is already gone. The write fails with epipe — the POSIX EPIPE errno. It is the write-side counterpart to a connection reset: instead of receiving an RST while reading, the broker discovers the dead peer when it attempts to send a frame.

2026-06-29 11:18:04.661 [warning] <0.30551.6> closing AMQP connection <0.30551.6> (10.0.6.88:55012 -> 10.0.4.21:5672, vhost: 'prod', user: 'feed-consumer'):
{writer,send_failed,{error,epipe}}
2026-06-29 11:18:04.662 [error] <0.30551.6> error on AMQP connection <0.30551.6> (10.0.6.88:55012 -> 10.0.4.21:5672, state: running):
{inet_error,epipe}

A client publisher pushing a large batch may see:

pika.exceptions.StreamLostError: Stream connection lost: BrokenPipeError(32, 'Broken pipe')

What the Error Means

epipe is raised when a process writes to a socket (or pipe) that has been closed by the peer. In RabbitMQ the broker’s connection writer process holds frames destined for the client; when it flushes them and the client’s socket is already gone, the kernel returns EPIPE and the writer fails with {writer, send_failed, {error, epipe}}.

The defining characteristic is direction: the failure is detected on a write, not a read. That usually means the broker had data queued to deliver — most commonly messages for a consumer — and the consumer disappeared. So broken-pipe errors skew toward the delivery path (broker writing to consumers) and toward large or backed-up writes, whereas resets skew toward idle reads.

Common Causes

1. Slow or stalled consumer with a full TCP buffer

A consumer that stops reading from the socket (blocked on disk, a lock, or a long handler) lets its receive buffer fill. When it finally dies or the buffer overflows the timeout, the broker’s next write hits a closed socket.

2. Consumer process vanished mid-delivery

The broker is streaming deliveries when the consumer is killed (OOM, deploy, crash). The half-sent frame’s write fails with epipe.

3. Network drop during a large write

A network blip, VPN flap, or path failure during transmission of a large message or a burst of deliveries closes the connection while the broker is mid-write.

4. Middlebox closed the connection one-directionally

A load balancer or firewall that half-closes or RSTs the flow can leave the broker writing into a dead pipe until the kernel reports EPIPE.

5. No heartbeats to detect the dead peer earlier

Without heartbeats, the broker only learns the peer is gone when it tries to write, so failures surface as epipe rather than a cleaner heartbeat timeout.

How to Reproduce the Error

Create a consumer that stops reading, then push enough messages to fill its socket buffer:

# Publish a backlog the consumer cannot drain, then kill the consumer mid-stream
python3 - <<'PY'
import pika, os, signal, time
conn = pika.BlockingConnection(pika.URLParameters("amqp://guest:guest@10.0.4.21:5672/%2F"))
ch = conn.channel()
ch.queue_declare(queue="epipe-test")
for i in range(50000):
    ch.basic_publish("", "epipe-test", b"x"*4096)   # ~200MB backlog
print("backlog queued; start a consumer, then SIGKILL it while it reads")
PY
# In another shell, start a consumer and immediately: kill -9 <consumer-pid>

While the broker streams the backlog to the consumer and the consumer is kill -9ed, the broker’s writer fails with {writer, send_failed, {error, epipe}}.

Diagnostic Commands

# Find broken-pipe events and the connections that hit them
sudo journalctl -u rabbitmq-server --no-pager | grep -E 'epipe|send_failed' | tail -20

# Identify consumers and their unacked backlog (slow consumers)
rabbitmqctl list_consumers queue_name channel_pid ack_required prefetch_count

# Show per-connection send pending and state
rabbitmqctl list_connections name peer_host state send_pend recv_cnt send_cnt

# Spot queues with large ready/unacked counts feeding slow consumers
rabbitmqctl list_queues name messages_ready messages_unacknowledged consumers

# Watch OS-level send-queue backup on the broker's AMQP sockets
sudo ss -tnm state established '( sport = :5672 )' | grep -A1 ':5672' | head

Step-by-Step Resolution

Step 1: Confirm it is a write-side failure

{writer, send_failed, {error, epipe}} confirms the broker failed while writing to a consumer. This points you at the delivery path, not the publish path.

Step 2: Find the affected consumers

Match the connection pid/peer in the log to a consumer. Check that consumer’s queue for a large messages_unacknowledged count — a sign it was too slow and built up in-flight deliveries.

Step 3: Set a sensible prefetch (QoS)

Unbounded prefetch lets the broker push thousands of deliveries into a consumer that cannot keep up, filling buffers. Set basic.qos(prefetch_count=N) to a modest value so the broker only sends what the consumer can ack.

Step 4: Make consumers acknowledge and read promptly

Ensure handlers do not block the I/O loop; offload slow work and ack as you go. A consumer that never reads the socket guarantees eventual epipe.

Step 5: Enable heartbeats and reconnect logic

Heartbeats (30-60s) let the broker detect a dead peer before a large write fails, and client auto-reconnect re-establishes delivery cleanly after a drop.

Prevention and Best Practices

Always set a bounded prefetch_count so the broker never streams more in-flight messages than a consumer can drain.
Keep consumer message handlers fast and non-blocking; move heavy work off the connection’s I/O loop.
Enable heartbeats so dead peers are detected proactively instead of on the next write.
Size messages sensibly and avoid pushing huge payloads to consumers that may stall mid-receive.
Monitor messages_unacknowledged per queue and alert when it grows unbounded — that is the leading indicator of a slow consumer headed for a broken pipe.
For fast triage, the free incident assistant can connect an epipe spike to a stalled consumer or backlog.

{socket_error, econnreset} — the read-side equivalent; the broker learns the peer is gone while reading rather than writing.
missed heartbeats, timeout — proactive detection of a dead peer before a write fails.
connection.blocked / resource alarm — publishers (not consumers) being throttled, a different flow-control path.
Error on AMQP connection ... state: running — the generic lifecycle line that wraps the epipe reason.

More patterns in the RabbitMQ guides.

Frequently Asked Questions

Is broken pipe the same as connection reset? They are siblings. Both mean the peer is gone, but epipe is detected on a write (the broker had data to send) while econnreset is detected on a read (an RST arrived). Broken pipe therefore concentrates on the delivery path to consumers.

Why does this mostly hit consumers, not publishers? Because the broker spends its write effort delivering messages to consumers. When a consumer stalls or dies while the broker is sending, the write fails with epipe. Publishers more often see resets on their own send path.

Will setting prefetch really help? Yes, significantly. Without QoS the broker can push a large in-flight backlog into a slow consumer, filling kernel buffers until a write fails. A bounded prefetch_count keeps in-flight volume matched to consumer speed.

Does a broken pipe lose messages? Messages delivered but not acknowledged are requeued (with ack_required), so they are redelivered to another consumer. Unbuffered, unacked work is not lost as long as consumers use manual acknowledgements.

How do heartbeats reduce broken-pipe errors? Heartbeats let the broker notice a dead peer during idle periods and close cleanly with a heartbeat timeout, rather than discovering the dead socket only when it attempts a large write that fails with epipe.

RabbitMQ Error Guide: '{socket_error, epipe}' Broken Pipe on Write

Exact Error Message

What the Error Means

Common Causes

1. Slow or stalled consumer with a full TCP buffer

2. Consumer process vanished mid-delivery

3. Network drop during a large write

4. Middlebox closed the connection one-directionally

5. No heartbeats to detect the dead peer earlier

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Step 1: Confirm it is a write-side failure

Step 2: Find the affected consumers

Step 3: Set a sensible prefetch (QoS)

Step 4: Make consumers acknowledge and read promptly

Step 5: Enable heartbeats and reconnect logic

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

1. Slow or stalled consumer with a full TCP buffer

2. Consumer process vanished mid-delivery

3. Network drop during a large write

4. Middlebox closed the connection one-directionally

5. No heartbeats to detect the dead peer earlier

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Step 1: Confirm it is a write-side failure

Step 2: Find the affected consumers

Step 3: Set a sensible prefetch (QoS)

Step 4: Make consumers acknowledge and read promptly

Step 5: Enable heartbeats and reconnect logic

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit