Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'NodeExistsException for /brokers/ids/1' Broker Registration Conflict

Fix Kafka KeeperException NodeExistsException for /brokers/ids: resolve duplicate broker.id, stale ephemeral nodes, session-timeout races, and cloned VM images.

  • #kafka
  • #troubleshooting
  • #errors
  • #zookeeper

Exact Error Message

[2026-06-29 14:02:11,455] ERROR Error while creating ephemeral at /brokers/ids/1, node already exists and owner '72057912345678901' does not match current session '72057998765432109' (kafka.zk.KafkaZkClient$CheckedEphemeral)
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /brokers/ids/1
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:122)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1904)
        at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1842)
        at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:96)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:320)
[2026-06-29 14:02:11,460] ERROR [KafkaServer id=1] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.RuntimeException: A broker is already registered on the path /brokers/ids/1. This probably indicates that you either configured a brokerid that is already in use, or else you shutdown this broker and restarted it faster than the zookeeper timeout so it appears to be re-registering.
[2026-06-29 14:02:11,612] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer)

Folded into this one failure you will commonly see: Error while creating ephemeral node, Failed to register broker 1 in ZooKeeper, and the broker shutting itself down because it cannot complete registration.

What the Error Means

When a Kafka broker starts, it creates an ephemeral znode at /brokers/ids/<broker.id> to register itself. Ephemeral znodes are tied to the broker’s ZooKeeper session: when the session ends, ZooKeeper automatically deletes the node. NodeExistsException means a node already lives at that path and it is owned by a different session than the one the starting broker holds. ZooKeeper refuses to let two owners claim the same registration path, so the broker fails startup and shuts down.

In short: something else already occupies /brokers/ids/1. Either two brokers truly share the same broker.id, or the previous incarnation of this broker has not yet been cleaned up by ZooKeeper.

This guide applies only to legacy, ZooKeeper-based Kafka clusters. KRaft-mode clusters do not use ZooKeeper; brokers register through the controller quorum and there are no ephemeral /brokers/ids znodes, so NodeExists for /brokers/ids/... does not occur there. On KRaft, duplicate node identity surfaces differently and is inspected with kafka-metadata-quorum.sh.

Common Causes

  • A stale ephemeral node from a previous session. The broker restarted faster than zookeeper.session.timeout.ms, so the old session has not expired yet and its ephemeral /brokers/ids/1 still lingers. The new process holds a new session, sees the old node, and gets NodeExists. This is the classic restart race.
  • Two brokers with the same broker.id. The textbook cause. Two distinct hosts are both configured with broker.id=1. Whichever registers first owns the path; the second fails to register and shuts down.
  • Split-brain after a network partition. A broker lost its ZooKeeper session during a partition; when connectivity returns it tries to re-register while the old ephemeral node, owned by a session ZooKeeper has not yet reaped, is still present.
  • A cloned VM or container image reused with the same broker.id. Baking broker.id into a base image and then scaling out produces several brokers that all believe they are broker 1.
  • Reused meta.properties / data directory. Copying a broker’s log directory (which carries the broker ID) onto a second host makes it claim the same identity.

How to Reproduce the Error

The fastest reproduction is a duplicate ID. Configure two brokers on two hosts both with broker.id=1, point both at the same zookeeper.connect, and start them. The first registers at /brokers/ids/1; the second throws NodeExistsException and shuts down.

To reproduce the stale-node race: set a long zookeeper.session.timeout.ms (say 30000), kill -9 a broker, and immediately restart it. Because the old session has not expired, the old ephemeral node still exists and the restart fails with NodeExists until the session times out.

Diagnostic Commands

All commands below are read-only. None of them create, delete, or modify znodes.

# Which broker IDs are currently registered?
zookeeper-shell.sh zk1:2181 ls /brokers/ids

# Who owns /brokers/ids/1 right now? Inspect host/endpoint and ephemeralOwner
zookeeper-shell.sh zk1:2181 get /brokers/ids/1
# Note the listener endpoints in the JSON, and the ephemeralOwner session id
# printed in the stat block - that session id identifies the current owner.

# Check the controller, in case the conflict involves controller election
zookeeper-shell.sh zk1:2181 get /controller

# Confirm whether the supposed owner is actually serving traffic
kafka-broker-api-versions.sh --bootstrap-server kafka1:9092 | head -5

# What broker.id does THIS host believe it is?
grep -E '^broker.id' /etc/kafka/server.properties
grep -E '^broker.id' /var/lib/kafka/meta.properties

# Pull the registration failure and ephemeral details from the broker log
journalctl -u kafka --no-pager | grep -iE 'NodeExists|ephemeral|register' | tail -30

# ZooKeeper side and session timeout
journalctl -u zookeeper --no-pager | tail -20
grep -E 'session.timeout' /etc/kafka/server.properties

# Read-only four-letter words to confirm the ensemble is healthy
echo ruok | nc zk1 2181
echo stat | nc zk1 2181

The decisive check is get /brokers/ids/1: look at the endpoints/host in the JSON payload. If they point at a different host than the broker that is failing, you have a duplicate broker.id. If they point at the same host that just restarted, you have a stale ephemeral node from the prior session.

Step-by-Step Resolution

  1. Determine which scenario you are in. Run zookeeper-shell.sh zk1:2181 get /brokers/ids/1 and read the endpoint/host in the JSON. Compare it to the host that is failing to start.

  2. If the endpoint belongs to a different host: you have a duplicate broker.id. Assign every broker a unique ID. Pick the host whose ID must change, set a unique broker.id in its server.properties (and reconcile meta.properties in its data directory, which also stores the ID), and restart that broker. Each broker must own exactly one ID across the whole cluster.

  3. If the endpoint belongs to the same host that just restarted: it is a stale ephemeral node. The old session simply has not expired. The cleanest fix is to wait out zookeeper.session.timeout.ms; once ZooKeeper reaps the dead session, it deletes the ephemeral node automatically and the broker registers on its next start attempt. Alternatively, stop and restart the offending broker cleanly so the old session is closed and the node is released.

  4. Verify the endpoint before taking any destructive action. Always confirm with get /brokers/ids/1 that the owner is genuinely dead (its host is down or its session is gone) before considering any manual cleanup. Acting while the real owner is alive will take a healthy broker out of the cluster.

  5. Do NOT blindly delete the znode. Deleting /brokers/ids/1 while the legitimate owner still holds the session removes a live broker’s registration, causing partition leadership churn and ISR shrink. Manual deletion is a last resort only when you have positively confirmed the owner is gone and the session will not be reaped on its own.

  6. For split-brain after a partition, restore network connectivity first and let sessions settle. The expired session’s ephemeral node will clear automatically; then the recovered broker re-registers.

After resolving, confirm with zookeeper-shell.sh zk1:2181 ls /brokers/ids that each expected broker ID is present exactly once.

Prevention and Best Practices

  • Allocate broker.id from a central, unique source and never bake a fixed ID into a base image or container template; derive it from host identity or use broker ID generation rather than cloning.
  • Keep zookeeper.session.timeout.ms sized so that normal restarts complete after the old session would expire, or rely on graceful shutdown so the session closes immediately and the ephemeral node is released cleanly.
  • Always shut brokers down gracefully (SIGTERM, not kill -9) so the ZooKeeper session closes and the ephemeral registration is removed at once.
  • Never copy a broker’s data directory or meta.properties to another host without changing the ID; the stored ID travels with the directory.
  • Add a startup check or inventory audit that asserts every broker ID is unique across the cluster before scaling out.
  • Consider migrating to KRaft mode, which eliminates ZooKeeper ephemeral registration and this conflict entirely. For a fast triage of a startup-failure page, the free incident assistant can turn the broker log into a likely cause.
  • KeeperException$NoNodeException: NoNode for /brokers/ids/1 — the inverse: the registration path is missing, usually a wrong chroot or wrong ensemble.
  • kafka.common.InconsistentClusterIdException — the broker’s stored cluster ID disagrees with ZooKeeper, common after reusing a data directory across clusters.
  • KeeperException$SessionExpiredException — the broker’s ZooKeeper session expired (the very event that leaves a stale ephemeral node behind for a moment).
  • org.apache.kafka.common.errors.ControllerMovedException — controller election churn that can accompany broker registration conflicts.

Frequently Asked Questions

How do I tell a duplicate broker.id from a stale ephemeral node? Run get /brokers/ids/1 and read the host/endpoint in the JSON. If it points at a different host than the one failing to start, it is a duplicate ID. If it points at the same host that just restarted, it is a stale ephemeral node from the prior session.

Can I just delete /brokers/ids/1 to make the broker start? Only if you have confirmed the real owner is dead. Deleting the znode while a live broker owns it removes that broker from the cluster and triggers leadership and ISR churn. Prefer waiting out the session timeout or restarting the offending broker.

How long until a stale ephemeral node disappears on its own? Up to zookeeper.session.timeout.ms after the owning session stops heartbeating. Once ZooKeeper reaps the dead session it deletes the ephemeral node automatically, and the broker can register on its next start.

Why does graceful shutdown avoid this on restart? A graceful shutdown closes the ZooKeeper session immediately, so the ephemeral node is deleted right away. A kill -9 leaves the session to time out, which is exactly the window in which a fast restart hits NodeExists.

Does this happen on KRaft clusters? No. KRaft has no ZooKeeper and no ephemeral /brokers/ids nodes. Broker identity conflicts surface through the controller quorum instead and are inspected with kafka-metadata-quorum.sh.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.