Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'Broker may not be available' Connection Failure

Fix Kafka 'Connection to node 1 could not be established. Broker may not be available': diagnose down brokers, wrong bootstrap servers, listeners, and firewalls.

  • #kafka
  • #troubleshooting
  • #errors
  • #connectivity

Exact Error Message

This warning floods the client logs (producer, consumer, or admin) when it cannot open a working connection to a broker:

[2026-06-29 14:02:11,883] WARN [Producer clientId=producer-1] Connection to node 1 (kafka-1.internal/10.0.4.21:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2026-06-29 14:02:12,901] WARN [Producer clientId=producer-1] Bootstrap broker kafka-1.internal:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)

You may also see the older short form simply logged as Broker not available, or a downstream timeout once the client gives up:

org.apache.kafka.common.errors.TimeoutException: Topic orders-v2 not present in metadata after 60000 ms.

The node id of -1 in the bootstrap line is significant: it means the client never completed a metadata exchange, so it is still using the synthetic bootstrap id rather than a real broker id.

What the Error Means

This is a transport-layer failure. The Kafka client tried to establish or maintain a TCP connection to a broker and the connection either was refused, timed out, or was dropped mid-handshake. Because it happens before (or during) the API version negotiation, it is almost never a topic, ACL, or consumer-group problem.

The message is a WARN, not a fatal error: the client retries on its reconnect backoff and will recover on its own the moment a broker becomes reachable. The danger is when it never recovers and the warning repeats indefinitely, eventually surfacing as a TimeoutException because no metadata could be fetched.

Critically, node 1 here is the broker id Kafka returned in metadata (or the bootstrap entry). The client may have reached one broker for bootstrap, learned the cluster’s advertised.listeners, and then failed to reach the address those listeners advertised.

Common Causes

  • The broker process is down or restarting. A crashed, OOM-killed, or mid-rolling-restart broker stops accepting connections on 9092.
  • Wrong bootstrap.servers. A typo, stale IP, or pointing at a host that never ran Kafka.
  • advertised.listeners is wrong. Bootstrap succeeds, but the broker advertises an address (e.g. an internal hostname or localhost) the client cannot resolve or route to.
  • Firewall / security group blocks 9092. A DROP rule causes connect timeouts; a REJECT causes instant failures.
  • Listener bound to the wrong interface. The broker listens only on 127.0.0.1, so remote clients cannot connect.
  • Security protocol mismatch. The client speaks PLAINTEXT to an SSL/SASL_SSL listener (or vice versa); the handshake fails and the connection drops.
  • DNS resolution failure for the advertised hostname.

How to Reproduce the Error

Point a client at a port where no broker listens, or at a stopped broker:

# With a broker stopped (or a port nothing listens on), run any client:
kafka-broker-api-versions.sh --bootstrap-server kafka-1.internal:9092

The client logs repeating “could not be established. Broker may not be available.” warnings, then fails. You can also reproduce the advertised.listeners variant by setting the broker’s advertised address to a hostname the client cannot resolve, then connecting from a remote host.

Diagnostic Commands

Confirm whether a broker is actually answering on the wire:

# Is the TCP port open from the client host?
nc -z -v kafka-1.internal 9092
ss -ltnp | grep -E ':9092|:9093'

Check whether the bootstrap node responds at the protocol level:

# Negotiate API versions; succeeds only if a broker truly answers
kafka-broker-api-versions.sh --bootstrap-server kafka-1.internal:9092

Inspect what the broker advertises and what the cluster looks like:

# What does the cluster think its brokers/endpoints are?
kafka-metadata-quorum.sh --bootstrap-server kafka-1.internal:9092 describe --status
kafka-cluster.sh cluster-id --bootstrap-server kafka-1.internal:9092

Confirm the resolved address and broker service state:

getent hosts kafka-1.internal
sudo systemctl status kafka --no-pager | head -8
sudo journalctl -u kafka --since "15 min ago" | grep -iE 'started|shutdown|error|bind'
grep -iE 'advertised.listeners|listeners=' /opt/kafka/config/server.properties

Step-by-Step Resolution

  1. Determine if it is a down broker or a reachability problem. Run nc -z -v <host> 9092 from the client host. A timeout points at a firewall DROP or down broker; an instant refusal points at no listener or a REJECT rule.
  2. Verify the broker is up. On the broker host, sudo systemctl status kafka and journalctl -u kafka to confirm it started and is not crash-looping. Restart if needed.
  3. Confirm the listener binding. ss -ltnp | grep 9092 on the broker. If it shows 127.0.0.1:9092, remote clients cannot connect — fix listeners to bind a reachable interface.
  4. Check advertised.listeners. This must be an address every client can resolve and route to. If bootstrap works but node 1 keeps failing, the advertised address is the culprit. Update it and restart the broker.
  5. Match the security protocol. Ensure the client’s security.protocol matches the listener (PLAINTEXT vs SSL vs SASL_SSL) and uses the correct port.
  6. Open the firewall. Allow the client CIDR to 9092/9093 in iptables and any cloud security group.
  7. Validate. Re-run kafka-broker-api-versions.sh --bootstrap-server <host>:9092; a clean version list means the path is healthy.

Prevention and Best Practices

  • Set advertised.listeners explicitly to a stable, resolvable name (DNS or VIP), never localhost, in any multi-host deployment.
  • List multiple brokers in bootstrap.servers so a single down broker does not block bootstrap.
  • Manage firewall rules through configuration management so a baseline reapply cannot silently drop the 9092 allow rule.
  • Add a synthetic check that runs kafka-broker-api-versions.sh from a client-network host and alerts on failure, catching this before the application fleet does.
  • Keep listener and security-protocol settings documented per environment so clients never default to the wrong port.
  • For a fast first pass on a connection page, the free incident assistant can turn the client warning plus systemctl status output into a likely cause.
  • kafka.errors.NoBrokersAvailable — the client-side (kafka-python) equivalent when no bootstrap broker can be reached.
  • BrokerEndPointNotAvailableException — a listener/security-protocol is missing for the requested name, so no endpoint is advertised.
  • TimeoutException: Topic ... not present in metadata — the downstream failure when this warning never resolves.
  • LeaderNotAvailableException — the broker is reachable but a partition has no leader yet.

Frequently Asked Questions

Is “Broker may not be available” fatal? No. It is a WARN and the client retries automatically. It only becomes a problem if it never recovers, which then surfaces as a metadata TimeoutException.

Bootstrap works but node 1 keeps failing — why? Bootstrap reaches one broker and learns the cluster’s advertised.listeners. If those advertised addresses are not resolvable or routable from the client, subsequent connections to specific node ids fail. Fix advertised.listeners.

Why does node id show as -1? -1 is the synthetic id for a bootstrap entry before metadata is fetched. Seeing it means the client never completed a metadata exchange with any broker.

Could this be an authentication problem? Not directly — this error occurs before or during the connection handshake. A security-protocol mismatch can cause the connection to drop and produce this warning, but credentials/ACLs are evaluated later and produce different errors.

How do I tell a down broker from a firewall block? A connect timeout suggests a firewall DROP or a host that is down; an instant “connection refused” suggests no listener bound or a REJECT rule. Test with nc -z -v from the client host.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.