RabbitMQ Error Guide: '{socket

Exact Error Message

A reset-by-peer error means the other end of an established TCP connection sent an RST packet, terminating the socket abruptly. Unlike a refused connection, this happens on a connection that was already open and working. RabbitMQ surfaces it on the server side as a socket_error and clients see ECONNRESET.

2026-06-29 09:41:55.118 [warning] <0.22417.3> closing AMQP connection <0.22417.3> (10.0.6.42:40122 -> 10.0.4.21:5672, vhost: 'prod', user: 'orders-svc'):
{socket_error,econnreset}
2026-06-29 09:41:55.119 [error] <0.22417.3> error on AMQP connection <0.22417.3> (10.0.6.42:40122 -> 10.0.4.21:5672, state: running):
{inet_error,econnreset}

On the client side (for example, a Python/pika consumer) it looks like:

pika.exceptions.StreamLostError: Stream connection lost: ConnectionResetError(104, 'Connection reset by peer')

What the Error Means

econnreset is the Erlang/POSIX code for “connection reset by peer.” It means a TCP RST arrived on a socket that was already established. The connection had completed the AMQP handshake (note state: running above) and was carrying traffic when the peer or an intermediary forcibly tore it down.

This is a transport-layer event, not an AMQP-level close — there is no Connection.Close method exchanged. Something killed the TCP socket out from under the protocol. The “peer” that sent the RST is not always the client; it is frequently a middlebox (load balancer, NAT gateway, or firewall) sitting between the client and broker.

Common Causes

1. Load balancer or proxy idle timeout

An L4 load balancer (AWS NLB, HAProxy, nginx stream) in front of RabbitMQ silently closes connections idle beyond its timeout, then RSTs subsequent traffic. AMQP connections are long-lived and often idle between messages, so they hit this constantly.

2. Client process crashed or was killed

If the client is OOM-killed, segfaults, or its container is SIGKILLed, the OS sends RST for the orphaned socket. This is the most common direct-client cause.

3. Stateful firewall / NAT dropped the flow

A stateful firewall or cloud NAT gateway evicts an idle flow from its connection table, then sends RST when traffic resumes on the now-unknown flow.

4. Heartbeat disabled while a middlebox enforces idle timeouts

With heartbeats off (heartbeat=0), nothing keeps the connection warm, so any idle-timeout device in the path resets it during quiet periods.

5. Abrupt broker-side resource action

A node under a memory/disk alarm or being shut down can drop connections; from the client’s view this also appears as a reset.

How to Reproduce the Error

Open a connection, let it idle past a deliberately short proxy timeout, then send traffic:

# With an L4 proxy (e.g., HAProxy timeout client 30s) in front of 5672,
# open a consumer with heartbeats disabled and let it sit idle.
python3 - <<'PY'
import pika, time
params = pika.URLParameters("amqp://guest:guest@proxy-host:5672/%2F?heartbeat=0")
conn = pika.BlockingConnection(params)
ch = conn.channel()
ch.queue_declare(queue="reset-test")
print("idling past the proxy timeout...")
time.sleep(60)            # exceed the proxy idle timeout
ch.basic_publish("", "reset-test", b"hi")   # RST arrives here
PY

The publish after the idle period raises ConnectionResetError(104) and the broker logs {socket_error, econnreset}.

Diagnostic Commands

# Count reset events in the broker log over time
sudo journalctl -u rabbitmq-server --since "1 hour ago" | grep -c econnreset

# Show the offending connections with peer and state
sudo journalctl -u rabbitmq-server --no-pager | grep -E 'econnreset' | tail -20

# List live connections and how long they have been up
rabbitmqctl list_connections name peer_host peer_port state timeout connected_at

# Inspect negotiated heartbeat per connection (0 = disabled)
rabbitmqctl list_connections name timeout

# Watch TCP resets at the OS level on the broker
sudo ss -tni state established '( dport = :5672 or sport = :5672 )' | head
nstat -az TcpExtTCPAbortOnData TcpExtTCPAbortOnTimeout 2>/dev/null

Step-by-Step Resolution

Step 1: Identify what sits between client and broker

Map the network path. If the client connects through an NLB, HAProxy, nginx stream, or NAT gateway, that intermediary is the prime suspect for idle resets.

Step 2: Enable and tune heartbeats

Set a heartbeat shorter than the smallest idle timeout in the path. A 30s heartbeat keeps the connection active and survives most LB timeouts.

amqp://user:pass@host:5672/%2F?heartbeat=30

Step 3: Raise or remove the middlebox idle timeout

Increase the LB/proxy idle timeout well above the heartbeat interval (for example, HAProxy timeout client/timeout server to 3600s for the AMQP frontend). Ensure the LB does not aggressively reset.

Step 4: Confirm the client is not crashing

Check the client host for OOM kills (dmesg | grep -i oom) and container restarts at the reset timestamps. A client crash needs a client fix, not a network change.

Step 5: Verify TCP keepalives

Enable TCP keepalives on the client and broker so dead peers are detected and idle flows stay in firewall tables. RabbitMQ supports tcp_listen_options.keepalive = true.

Prevention and Best Practices

Always enable AMQP heartbeats (30-60s) for connections that traverse load balancers, NAT, or firewalls; never run heartbeat=0 across a middlebox.
Set LB/proxy idle timeouts higher than the heartbeat interval, and prefer connection draining over hard resets.
Use automatic reconnect with backoff in clients so an occasional reset is self-healing rather than fatal.
Monitor econnreset rate in broker logs and alert on spikes — a sudden jump usually means an LB config change or a crashing client fleet.
Enable TCP keepalives at the OS and broker level to keep flows alive in stateful firewall tables.
For fast triage, the free incident assistant can correlate a reset spike with client and proxy events.

{socket_error, epipe} (broken pipe) — the broker tried to write to an already-closed socket, the outbound counterpart to a read-side reset.
missed heartbeats, timeout — what RabbitMQ logs when heartbeats are on but frames stop; an alternative outcome to a raw RST.
connection_closed_abruptly — a peer that drops TCP during the handshake rather than mid-stream.
ECONNREFUSED — a reset on a new connection attempt, not an established one.

See the RabbitMQ guides for the full set.

Frequently Asked Questions

Is econnreset always the client’s fault? No. The “peer” that sends the RST is whichever device terminates the TCP flow. Very often it is a load balancer, NAT gateway, or firewall in the middle enforcing an idle timeout, not the application client.

Will enabling heartbeats fix this? Usually, when the cause is an idle-timeout middlebox. A heartbeat shorter than the smallest idle timeout keeps the connection active so nothing resets it. It will not help if the client process itself is crashing.

Why do resets cluster during quiet hours? Because AMQP connections sit idle when there is little traffic, and idle is exactly when LB/firewall timeouts fire. With heartbeats off, low-traffic periods are when resets concentrate.

How is reset by peer different from connection refused? Refused (ECONNREFUSED) happens on a brand-new connection attempt — nothing is listening or a firewall rejects. Reset (ECONNRESET) happens on a connection that was already established and working, then forcibly torn down.

Should clients retry automatically on a reset? Yes. Resets are expected occasionally in any distributed system. Clients should reconnect with exponential backoff and re-declare their topology, treating a single reset as transient rather than fatal.

RabbitMQ Error Guide: '{socket_error, econnreset}' Connection Reset by Peer

Exact Error Message

What the Error Means

Common Causes

1. Load balancer or proxy idle timeout

2. Client process crashed or was killed

3. Stateful firewall / NAT dropped the flow

4. Heartbeat disabled while a middlebox enforces idle timeouts

5. Abrupt broker-side resource action

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Step 1: Identify what sits between client and broker

Step 2: Enable and tune heartbeats

Step 3: Raise or remove the middlebox idle timeout

Step 4: Confirm the client is not crashing

Step 5: Verify TCP keepalives

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

1. Load balancer or proxy idle timeout

2. Client process crashed or was killed

3. Stateful firewall / NAT dropped the flow

4. Heartbeat disabled while a middlebox enforces idle timeouts

5. Abrupt broker-side resource action

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Step 1: Identify what sits between client and broker

Step 2: Enable and tune heartbeats

Step 3: Raise or remove the middlebox idle timeout

Step 4: Confirm the client is not crashing

Step 5: Verify TCP keepalives

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit