Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'java.io.IOException: Connection reset by peer' Broker Reset

Fix Kafka 'Connection reset by peer' — diagnose broker restarts, load balancer and firewall idle resets, and plaintext-to-SSL listener mismatches.

  • #kafka
  • #troubleshooting
  • #errors
  • #network

Exact Error Message

This appears in a Kafka client or broker log when an already-established TCP connection is abruptly torn down by the other side:

[2026-06-29 09:41:22,517] WARN [Consumer clientId=consumer-7 groupId=orders] Connection to node 3 (kafka-3.internal/10.0.4.33:9092) terminated during authentication. This may happen due to any of the following reasons: (1) Firewall blocking Kafka TLS traffic ... (org.apache.kafka.common.network.Selector)
[2026-06-29 09:41:22,520] WARN [Consumer clientId=consumer-7 groupId=orders] Error in I/O with kafka-3.internal/10.0.4.33
java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:46)
        at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:118)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:551)

What the Error Means

“Connection reset by peer” is the OS reporting that it received a TCP RST packet on a socket that was already open and in use. Unlike “connection refused” (which happens at connect time) or a bootstrap node -1 failure (which happens before metadata), this error means the connection succeeded earlier and was then forcibly closed by the remote end mid-stream.

The “peer” is whatever sent the RST. That is usually the broker itself, but it can also be a middlebox — a load balancer, NAT gateway, or stateful firewall — that decided the flow was no longer valid and reset it. The key signal is that data was flowing or being negotiated, and the far side said “this connection is over, now” rather than closing it gracefully with a FIN.

Because the reset can arrive during the TLS/SASL handshake or during normal traffic, Kafka often pairs it with the “terminated during authentication” hint, which is a common but not exclusive cause.

Common Causes

  • A broker restarted or crashed. Rolling restarts, OOM kills, or controlled.shutdown tear down live connections; clients see RST until they reconnect to a healthy broker.
  • A load balancer or firewall idle-timeout reset. A stateful device drops the flow from its table after an idle period and RSTs the next packet. Kafka’s default connections.max.idle.ms (10 min) can exceed an LB’s idle timeout (often 60–350s), so the device resets first.
  • Plaintext client against an SSL listener (or vice versa). The client sends plaintext bytes to an SSL port; the broker cannot parse the TLS record and resets, surfacing as a reset “during authentication.”
  • TLS handshake failure from an untrusted cert, protocol/cipher mismatch, or SNI issue causes the broker to abort the connection with an RST.
  • Broker overload / request-queue saturation can cause the broker to close connections it cannot service.

How to Reproduce the Error

Establish a healthy consumer, then restart its leader broker:

kafka-console-consumer.sh --bootstrap-server kafka-3.internal:9092 --topic orders --group orders
# in another shell, restart the broker the consumer is connected to

The consumer logs Connection reset by peer and reconnects. To reproduce the mismatch variant, point a plaintext client at an SSL listener (port 9093 configured as SSL://) with no security.protocol=SSL — the broker resets during the handshake.

Diagnostic Commands

All read-only.

Check broker uptime / recent restart to explain a reset:

sudo systemctl status kafka --no-pager | grep -E 'Active|since'

Scan the broker log for shutdowns, OOM, or handshake failures around the timestamp:

sudo journalctl -u kafka --no-pager --since "10 min ago" | grep -iE 'shutdown|terminat|SSL|handshake|OutOfMemory'

Confirm whether the listener expects TLS, and probe it read-only:

openssl s_client -connect kafka-3.internal:9093 -brief </dev/null

If that listener is plaintext, openssl will report a handshake failure, confirming you should not be using SSL there (or vice versa). Verify reachability and protocol with the CLI:

kafka-broker-api-versions.sh --bootstrap-server kafka-3.internal:9092

Check for an idle-reset middlebox by watching how long a connection survives:

ss -tni dst 10.0.4.33:9092

Step-by-Step Resolution

1. Correlate with broker events. Check systemctl status kafka uptime and journalctl around the reset timestamp. If the broker restarted or OOM-killed, the reset is expected churn — clients should reconnect automatically. Fix the underlying restart/OOM cause.

2. Rule out a protocol mismatch. If the reset happens “during authentication” immediately on every connect, you likely have a plaintext/SSL mismatch. Confirm the listener’s protocol from the broker config and probe with openssl s_client. Align the client’s security.protocol with the listener.

3. Check for an idle-timeout middlebox. If resets cluster around a fixed idle interval (e.g., every 60s of inactivity), a load balancer or firewall is resetting idle flows. Lower the client connections.max.idle.ms below the device’s idle timeout, or raise/disable the device’s idle timeout for the Kafka port. Keepalives on producers/consumers also help.

4. Validate TLS material. For handshake resets, confirm the client trusts the broker cert chain and that protocol/cipher versions overlap. openssl s_client -connect host:9093 shows the negotiated chain read-only.

5. Look at broker load. If resets coincide with high request-queue or network-thread saturation, the broker may be shedding connections. Check broker metrics and scale or tune accordingly.

Prevention and Best Practices

  • Set client connections.max.idle.ms lower than any load balancer or firewall idle timeout on the Kafka path so the client recycles connections before a middlebox resets them.
  • Use rolling restarts with controlled.shutdown.enable=true and proper client retries so reconnects are graceful, not error storms.
  • Standardize security.protocol across clients per listener and validate it in config — never let a plaintext client hit an SSL port.
  • Monitor broker RequestQueueSize and network-thread idle ratio to catch overload before it manifests as resets.
  • Treat occasional resets during a deploy as normal; alert only on sustained reset rates.
  • Broken pipe — the local side wrote to a socket the peer already closed; the write-side counterpart to this read-side reset.
  • Node N disconnected — Kafka’s higher-level report that a known broker’s connection dropped, often accompanying a reset.
  • Connection to node -1 ... disconnected — bootstrap-time reset, frequently the plaintext-to-SSL mismatch.
  • SocketTimeoutException — a connect-time timeout, not a reset of an established connection.

Frequently Asked Questions

Who is the “peer” that reset the connection? The remote end of the TCP socket: usually the broker, but possibly a load balancer, NAT, or firewall between client and broker.

Is a single reset a problem? No. Brief resets during a broker restart or deploy are normal — clients reconnect. Worry about a sustained or periodic reset rate.

Why does it say “during authentication” if I am not using auth? That hint covers the early connection phase including TLS. A plaintext client hitting an SSL listener trips it without any real authentication configured.

How do I tell a reset from a refused connection? Refused happens at connect (no established session); reset happens after the connection was already open. The stack trace here is in read/poll, meaning data was flowing.

Could my load balancer be the cause? Yes — if resets recur at a fixed idle interval, an LB or firewall is dropping idle flows. Align timeouts as described above.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.