Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'NotLeaderOrFollowerException' Stale Leader Metadata

Fix Kafka NotLeaderOrFollowerException (formerly NotLeaderForPartition): stale client metadata after a leader move, reassignments, and broker restarts.

  • #kafka
  • #troubleshooting
  • #errors
  • #metadata

Exact Error Message

This exception appears when a client sends a produce or fetch request to a broker that is no longer the leader for the target partition:

[2026-06-29 16:33:09,772] WARN [Producer clientId=producer-1] Received invalid metadata error in produce request on partition orders-v2-3 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic-partition. (org.apache.kafka.clients.producer.internals.Sender)
org.apache.kafka.common.errors.NotLeaderOrFollowerException: This server is not the leader for that topic-partition.

This exception replaced the older NotLeaderForPartitionException; you may still see the legacy name in older clients or documentation:

org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

What the Error Means

The client sent a request to broker X for partition P, but broker X is no longer the leader (and, for replica-targeted requests, not even a replica) of P. Leadership moved — because of a broker restart, a preferred-leader election, or a partition reassignment — and the client’s cached metadata still points at the old leader.

This is a retriable exception and almost always transient. When a client receives it, it knows its metadata is stale, triggers an immediate metadata refresh, learns the new leader, and reroutes the request. End-to-end this usually costs a few milliseconds to a second, and well-behaved producers/consumers recover with no data loss and no operator involvement.

It is only worth investigating when it is frequent and sustained, which points at metadata that keeps going stale — for example flapping leadership, an unstable broker, or a controller that cannot propagate metadata cleanly.

Common Causes

  • Leader moved during a broker restart or rolling upgrade; the client still targets the old leader.
  • Preferred leader election rebalanced leadership back to a restored broker.
  • Partition reassignment changed which brokers host (and lead) a partition.
  • Stale client metadata because metadata.max.age.ms had not yet expired when leadership changed.
  • Flapping broker repeatedly losing and regaining leadership, generating sustained errors.
  • Controller instability causing inconsistent or slowly propagated metadata.
  • Network partition that isolated a broker, forcing a leader change the client has not learned about.

How to Reproduce the Error

Trigger a leadership change while a client is actively producing:

# Inspect current leaders before the change
kafka-topics.sh --bootstrap-server kafka-1:9092 --describe --topic orders-v2
# Restart the broker that leads partition 3 (operationally), or run a preferred-election;
# a client mid-stream will briefly log NotLeaderOrFollowerException for that partition.

While the leader for orders-v2-3 moves from the restarted broker to another replica, an in-flight producer holding the old metadata logs the exception once, refreshes metadata, and resumes. Sustained reproduction requires a broker that keeps losing/regaining leadership.

Diagnostic Commands

Confirm the current leader for each partition and look for under-replication:

kafka-topics.sh --bootstrap-server kafka-1:9092 --describe --topic orders-v2
kafka-topics.sh --bootstrap-server kafka-1:9092 --describe --under-replicated-partitions

Check cluster/controller health and broker reachability:

kafka-metadata-quorum.sh --bootstrap-server kafka-1:9092 describe --status
kafka-broker-api-versions.sh --bootstrap-server kafka-1:9092

Look for leadership churn and reassignment activity in the logs:

sudo journalctl -u kafka --since "20 min ago" | grep -iE 'leader|election|reassign|isr'
grep -iE 'leader.imbalance|auto.leader.rebalance.enable' /opt/kafka/config/server.properties
ss -ltnp | grep -E ':9092|:9093'

Step-by-Step Resolution

  1. Confirm it is transient. Check whether the client recovers on its own within a second or two. A single burst around a known restart or reassignment is expected and needs no action.
  2. Describe the topic. kafka-topics.sh --describe. Note the current Leader for the affected partitions; if leaders look stable now, the error was just stale metadata.
  3. Correlate with cluster events. Was there a restart, upgrade, preferred-leader election, or reassignment in the same window? That explains a transient burst.
  4. If sustained, find the unstable broker. Use journalctl to spot a broker repeatedly entering/leaving ISR or losing leadership. An unhealthy broker that flaps causes continuous errors.
  5. Check the controller. kafka-metadata-quorum.sh describe --status — a struggling controller propagates metadata slowly, prolonging staleness. Stabilize it.
  6. Tune client metadata freshness if needed. A very high metadata.max.age.ms lengthens the stale window after leader moves; the default is usually fine, but verify the client is not overriding it excessively.
  7. Verify recovery. After the cluster stabilizes, the warnings should stop and the topic should show stable leaders with full ISR.

Prevention and Best Practices

  • Rely on client retries — this exception is retriable by design, and clients refresh metadata automatically on receipt. Do not fail fast on it.
  • Perform rolling restarts one broker at a time, waiting for under-replicated partitions to reach zero between steps, to minimize leadership churn.
  • Keep auto.leader.rebalance.enable=true so leadership returns to preferred replicas predictably rather than piling onto a few brokers.
  • Schedule partition reassignments during low-traffic windows and expect a brief burst of these warnings as clients refresh.
  • Monitor leadership churn and ISR shrink/expand rates so a flapping broker pages you before clients see sustained errors.
  • Keep metadata.max.age.ms at sane defaults so clients pick up leader changes promptly. For triage help, the free incident assistant can interpret a --describe dump and recent log lines.
  • NotLeaderForPartitionException — the legacy name this exception replaced; same meaning.
  • LeaderNotAvailableException — there is no leader at all yet, versus the leader having simply moved.
  • UnknownTopicOrPartitionException — the contacted broker does not host the partition because the topic is missing or metadata is badly stale.
  • ReplicaNotAvailableException — a replica is temporarily unavailable, usually informational.

Frequently Asked Questions

Is NotLeaderOrFollowerException the same as NotLeaderForPartition? Effectively yes. NotLeaderOrFollowerException replaced NotLeaderForPartitionException and broadened the meaning to cover requests intended for any replica, not just the leader.

Do I lose messages when this happens? No. The producer refreshes metadata and retries to the new leader. With proper acks and retries configured, there is no data loss.

Why do I see a burst of these during a deploy? Rolling restarts and preferred-leader elections move leadership. Clients with cached metadata briefly target the old leader, get this error, refresh, and reroute — a normal, transient burst.

It is happening constantly, not in bursts. What is wrong? Sustained errors mean leadership keeps changing. Look for a flapping broker (ISR churn in the logs) or an unstable controller, and stabilize that broker or the quorum.

Should my client catch and handle it? Standard Kafka clients already handle it via automatic metadata refresh and retry. You generally should not special-case it in application code.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.