Kafka Error Guide: 'Controller not available' No Active Controller
Fix Kafka 'Controller not available / controller connection failed': diagnose quorum loss, no elected controller, ZooKeeper outages, and KRaft voter majority failures.
- #kafka
- #troubleshooting
- #errors
- #controller
Exact Error Message
An admin operation that needs the controller fails because no broker is currently acting as one. From a client/admin tool:
org.apache.kafka.common.errors.ControllerMovedException: Controller not available
or, more commonly surfaced as a broker that cannot find the controller in server.log:
[2026-06-28 11:03:52,140] WARN [Broker id=2] Controller not available, retrying metadata request
(org.apache.kafka.clients.NetworkClient)
[2026-06-28 11:03:52,455] ERROR [Broker id=2] Connection to controller (id 1) failed.
Controller connection failed: java.net.ConnectException: Connection refused (kafka.server.BrokerToControllerChannelManager)
[2026-06-28 11:03:53,001] WARN [Broker id=2] No controller is currently active in the cluster
(kafka.controller.KafkaController)
In a KRaft cluster the quorum tool shows no leader:
LeaderId: -1
LeaderEpoch: 34
CurrentVoters: [1,2,3]
What the Error Means
“Controller not available” means there is currently no broker holding the controller role, or the controller exists but no broker can reach it. Because the controller is the single component that performs metadata operations — topic create/delete, leader election, partition reassignment, ISR updates — its absence freezes all administrative changes. Existing produce/consume traffic on already-elected partition leaders may continue for a while, but any operation requiring metadata coordination fails or hangs.
There are two distinct flavors. No elected controller (the worst case): election cannot complete because the underlying coordination layer has lost quorum — a majority of ZooKeeper nodes are down, or a majority of KRaft controller voters are unreachable. A LeaderId of -1 in the quorum tool is the signature. Controller unreachable: a controller is elected but a particular broker or client cannot connect to it (network/firewall/listener issue), so it reports “controller connection failed”. The first is a cluster-wide control-plane outage; the second is a connectivity problem from one node’s perspective. Distinguishing them is the core of diagnosis.
Common Causes
- ZooKeeper quorum loss (ZK mode): fewer than a majority of ensemble nodes are up, so no controller can be elected or maintained.
- KRaft voter majority loss: a majority of controller-quorum voters are down or partitioned;
LeaderIdbecomes-1. - Controller broker down with no election possible because the metadata layer cannot record the new controller.
- Network partition / firewall blocking the controller’s listener port so peers report “controller connection failed”.
- Wrong
controller.quorum.voters/controller.listener.namesconfiguration in KRaft, so voters cannot form a quorum. - All controller-eligible nodes restarted simultaneously, leaving no quorum during the gap.
How to Reproduce the Error
On a disposable cluster:
- ZK mode: stop a majority of the ZooKeeper ensemble (e.g. 2 of 3). Brokers log “No controller is currently active” and admin ops return “Controller not available”.
- KRaft mode: stop a majority of the controller voters (e.g. 2 of 3 dedicated controllers).
kafka-metadata-quorum.sh ... describe --statusreportsLeaderId: -1. - Run
kafka-topics.sh --bootstrap-server localhost:9092 --create --topic repro ...; it fails because there is no controller. - Restore the majority of ZK nodes / KRaft voters; a controller is elected and the error clears.
Do this only in a throwaway environment — it intentionally takes down the control plane.
Diagnostic Commands
All read-only.
First determine whether a controller exists at all (KRaft):
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --replication
LeaderId: -1 means no active controller — quorum problem. A valid LeaderId with brokers still erroring means a connectivity problem to that node.
For ZooKeeper mode, check for the controller znode and ZK health (read-only):
zookeeper-shell.sh localhost:2181 get /controller
echo srvr | nc localhost 2181 | grep -i mode
If get /controller returns “Node does not exist”, no controller is elected.
Verify reachability and which brokers respond:
kafka-broker-api-versions.sh --bootstrap-server localhost:9092 | head -5
Pull the relevant log lines:
grep -E "Controller not available|No controller is currently active|Controller connection failed|controller (id [0-9]+) failed" \
/var/log/kafka/server.log | tail -30
journalctl -u kafka --since "1 hour ago" | grep -iE "controller|quorum|zookeeper|connection refused"
Step-by-Step Resolution
- Classify the failure. Run the quorum/znode check.
LeaderId: -1or a missing/controllerznode = no controller (quorum loss). A valid leader but per-broker “connection failed” = connectivity problem. - If quorum is lost (ZK): restore a majority of ZooKeeper nodes. Verify with
echo srvr | nc <zk> 2181that one reportsMode: leaderand the restfollower. Once quorum returns, brokers elect a controller automatically. - If quorum is lost (KRaft): bring back enough controller voters to form a majority of
CurrentVoters. Confirmcontroller.quorum.votersis identical on every voter and that their listener ports are reachable. - If the controller is reachable by some brokers but not others: treat it as a network problem — check firewall rules and the controller listener port, and confirm the affected broker can open a TCP connection to the controller’s advertised address.
- If all controller-eligible nodes were restarted together, simply wait for a majority to come back; election needs a quorum present simultaneously.
- Validate:
describe --statusshows a non-negativeLeaderId,get /controllerreturns a broker, and admin operations succeed again.
Prevention and Best Practices
- Always run an odd-sized quorum (3 or 5 ZK nodes / KRaft voters) and never let a majority go offline at once during maintenance — drain and replace one node at a time.
- Keep
controller.quorum.votersandcontroller.listener.namesconsistent across all KRaft voters; a mismatch silently prevents quorum. - Spread controller-eligible nodes across failure domains (racks/AZs) so a single fault cannot take out a majority.
- Open and monitor the controller listener port between all brokers and voters so “controller connection failed” can’t come from a firewall change.
- Alert on
ActiveControllerCount(should be exactly 1 cluster-wide) and onLeaderId == -1; either is a control-plane outage. - For triage, the free incident assistant can turn the log and quorum output into a “quorum vs connectivity” verdict quickly.
Related Errors
This is not the correct controller for this cluster— a controller does exist; the client just hit a stale one.Error while electing or becoming controller on broker N— election is actively failing (often the precursor to no controller being available).Controller epoch X is older than Y— a stale controller after split brain, not an absent one.java.net.ConnectException: Connection refused— the transport-level cause behind “controller connection failed”.
Frequently Asked Questions
Why can producers still work while the controller is “not available”? Already-elected partition leaders keep serving produce/consume for a time. Only operations needing the controller — topic management, reassignment, new leader election — fail. A prolonged outage will eventually impact availability as leaders fail without replacement.
What does LeaderId: -1 mean?
There is no elected metadata-quorum leader in KRaft — i.e., no active controller. It almost always means a majority of controller voters is unreachable.
Is this a network problem or a quorum problem?
Check the quorum tool / /controller znode. If no controller exists anywhere, it’s quorum. If a controller exists but one broker can’t reach it, it’s connectivity. The fixes are completely different.
How many ZooKeeper or KRaft nodes can I lose? You need a strict majority online. For 3 nodes you can lose 1; for 5 you can lose 2. Lose the majority and you lose the controller.
Can I manually elect a controller? No. Election is automatic once a quorum is present. The fix is to restore the quorum or the connectivity, not to force election.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.