Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'This is not the correct controller for this cluster' Controller Moved

Fix Kafka 'not the correct controller / controller moved to another broker': understand normal failover, stale controllers, and how to confirm the real active controller.

  • #kafka
  • #troubleshooting
  • #errors
  • #controller

Exact Error Message

A client, admin tool, or peer broker contacts a broker that used to be the controller but no longer is, and you see the controller-moved error in server.log:

[2026-06-28 14:02:11,447] INFO [Controller id=1] Controller shutting down (kafka.controller.KafkaController)
[2026-06-28 14:02:11,512] INFO [Controller id=1] Resigned (kafka.controller.KafkaController)
[2026-06-28 14:02:12,003] WARN [Broker id=1] Received a request for a topic operation but this
 broker is no longer the active controller. This is not the correct controller for this cluster.
 (kafka.server.KafkaApis)
[2026-06-28 14:02:12,118] INFO [Controller id=3] 3 successfully elected as the controller. Epoch
 incremented to 27 (kafka.controller.KafkaController)

Client-side you will often see the symptom surfaced as a retriable error code:

org.apache.kafka.common.errors.NotControllerException: This is not the correct controller for this cluster.

What the Error Means

In a Kafka cluster exactly one broker acts as the controller at any time. The controller owns cluster metadata operations: leader election, topic create/delete, partition reassignment, and ISR changes. In a ZooKeeper-based cluster the controller is elected by writing the /controller znode; in a KRaft cluster the active controller is the leader of the metadata quorum.

NotControllerException / “This is not the correct controller for this cluster” means a request that only the controller can serve arrived at a broker that is not the current controller. This is overwhelmingly normal and benign: the controller role moved to another broker (a failover), and a client or peer briefly held a stale view of which broker was in charge. Kafka clients are designed to retry — they refresh metadata, learn the new controller, and re-send. A handful of these lines around a restart or failover is expected.

It only becomes a real problem if the error is persistent: every controller request fails because no broker believes it is the controller, or two brokers each think they are (a stale controller that has not yet resigned).

Common Causes

  • Normal controller failover. The old controller restarted, was shut down for maintenance, or lost its ZooKeeper session / quorum leadership, and election handed the role to another broker. Clients with cached metadata briefly hit the old broker.
  • Stale client or admin tool metadata. A long-lived AdminClient, MirrorMaker, or monitoring tool cached the controller id and keeps hitting the former controller until it refreshes.
  • A resigned-but-not-fully-down controller. Broker 1 resigned the controller role but is still up; requests routed to it (by an out-of-date client) get the not-controller response until they re-resolve.
  • Rolling restart in progress. During a rolling restart the controller hops between brokers several times; transient NotControllerException is expected for the duration.
  • Genuine controller instability (the bad case). Flapping ZooKeeper sessions or an unstable KRaft quorum cause the controller to move repeatedly, so clients never settle on a stable target.

How to Reproduce the Error

On a non-production cluster, force a controller failover and watch a client retry:

  1. Identify the current controller (see Diagnostic Commands).
  2. Cleanly stop the controller broker: sudo systemctl stop kafka on that node.
  3. Immediately run a controller-only operation from a client whose metadata is stale, e.g. kafka-topics.sh --bootstrap-server <old-controller>:9092 --create --topic repro --partitions 1 --replication-factor 1.
  4. The first attempt logs NotControllerException; the client refreshes metadata and the retry succeeds against the newly elected controller.

This demonstrates the benign path: a transient error followed by automatic recovery once metadata is refreshed.

Diagnostic Commands

All commands below are read-only.

Find the active controller in a KRaft cluster:

kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status
ClusterId:              k4f-9a1c
LeaderId:               3
LeaderEpoch:            27
HighWatermark:          1048576
CurrentVoters:          [1,2,3]

LeaderId is the active controller. For a ZooKeeper-based cluster, read the /controller znode (legacy, read-only get):

zookeeper-shell.sh localhost:2181 get /controller
{"version":2,"brokerid":3,"timestamp":"1782050532118"}

Confirm broker connectivity and the API view of the cluster:

kafka-broker-api-versions.sh --bootstrap-server localhost:9092 | head -5

Inspect controller transitions in the logs to see whether failover is one-time or flapping:

grep -E "successfully elected as the controller|Resigned|not the correct controller" \
  /var/log/kafka/server.log | tail -20
journalctl -u kafka --since "30 min ago" | grep -iE "controller|NotController"

Step-by-Step Resolution

  1. Determine the real controller with kafka-metadata-quorum.sh ... describe --status (KRaft) or zookeeper-shell.sh ... get /controller (ZK). Note the broker id.
  2. Confirm the error is transient. Grep server.log for “successfully elected as the controller”. If there is a single recent election and traffic recovered, no action is needed — clients self-heal.
  3. Point tools at bootstrap servers, not a fixed broker. If an admin tool keeps failing, ensure it uses --bootstrap-server with multiple brokers so it can re-resolve the controller. Restart any long-lived client that cached a dead controller.
  4. If the role is flapping, check the controller broker’s health: ZooKeeper session timeouts (zookeeper.session.timeout.ms), GC pauses, or quorum instability. Look for repeated “Resigned” / “elected” pairs within minutes.
  5. For a resigned-but-up broker, verify it has actually relinquished the role (its log shows “Resigned” and the quorum/znode points elsewhere). If it still advertises itself as controller, restart it cleanly.
  6. Validate recovery by re-running the controller operation against bootstrap servers; it should succeed against the current leader.

Prevention and Best Practices

  • Always connect clients and admin tools to a bootstrap list of multiple brokers, never a single hardcoded controller, so metadata refresh can find the new controller after failover.
  • Keep client libraries current; modern clients handle NotControllerException transparently with retries and backoff. Avoid disabling retries on AdminClient.
  • Do rolling restarts one broker at a time with await between steps, and intentionally restart the controller last so the role only moves once.
  • Stabilize the metadata layer: tune zookeeper.session.timeout.ms for ZK clusters or ensure low-latency, well-provisioned controllers for KRaft, to prevent unnecessary failovers.
  • Alert on controller election rate, not on individual NotControllerException events — a spike in elections is the real signal.
  • For fast triage of a controller page, the free incident assistant can turn the log snippet and quorum output into a likely cause.
  • Error while electing or becoming controller on broker N — election itself is failing, not just a stale view.
  • Controller not available — no broker currently holds the controller role (quorum loss).
  • Controller epoch X is older than Y — a stale controller acting on an outdated epoch (split brain / fencing).
  • Controller moved to another broker (info-level) — the normal log line announcing failover, not an error.

Frequently Asked Questions

Is NotControllerException an error I need to fix? Usually not. It is a retriable exception. A few around a restart or failover are normal; clients refresh metadata and retry automatically. Only persistent or high-frequency occurrences warrant investigation.

Why did the controller move at all? Common triggers: the controller broker restarted or was stopped, it lost its ZooKeeper session, it lost KRaft quorum leadership, or a rolling restart shifted the role. Election then picks a new controller and bumps the epoch.

How do I know which broker is the controller right now? In KRaft, kafka-metadata-quorum.sh ... describe --status shows LeaderId. In ZooKeeper mode, get /controller returns the brokerid.

Both my brokers logged controller activity — is that split brain? Not necessarily. During failover the old controller logs “Resigned” and the new one logs “elected”. Split brain is when two brokers both believe they are active concurrently and act on different epochs — see the controller epoch mismatch guide.

Can I pin the controller to one broker? No, and you should not try. Controller election is automatic and the role is meant to move on failure. Design clients to follow it via bootstrap servers instead.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.