Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'ReplicaNotAvailableException' Replica Reassignment Notice

Understand Kafka ReplicaNotAvailableException: usually transient and informational during reassignment, when to ignore it, and when a replica is truly offline.

  • #kafka
  • #troubleshooting
  • #errors
  • #replication

Exact Error Message

This exception typically appears in metadata responses or admin/CLI output, often during a partition reassignment:

[2026-06-29 13:05:41,290] WARN [Consumer clientId=consumer-1, groupId=analytics] Error while fetching metadata with correlation id 9 : {events-v1=REPLICA_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
org.apache.kafka.common.errors.ReplicaNotAvailableException: The replica is not available for the requested topic-partition.

You may also see it surface when describing a topic mid-reassignment, alongside otherwise healthy leader/ISR information:

Topic: events-v1  Partition: 4  Leader: 2  Replicas: 2,3,5  Isr: 2,3
org.apache.kafka.common.errors.ReplicaNotAvailableException: The replica is not available for the requested topic-partition.

What the Error Means

ReplicaNotAvailableException indicates that a particular replica of a topic-partition is not currently available — most often because that replica is being created, moved, or has not yet caught up. Historically this was returned in metadata when a replica was undergoing reassignment.

The key thing to understand is that this exception is usually transient and informational, not a failure of your produce or fetch. As long as the partition has a leader and a sufficient in-sync replica set, reads and writes continue normally. The exception is signaling “this specific replica isn’t ready,” not “the partition is down.” Modern Kafka clients largely treat it as retriable and ignore it for routing purposes.

It matters operationally only when a replica that should be available stays unavailable — for example a broker hosting a needed replica is offline and the ISR has shrunk to a level that threatens min.insync.replicas.

Common Causes

  • In-progress partition reassignment. A replica is being moved to a new broker and is not yet available there.
  • New replica catching up. After increasing replication factor or moving a replica, the follower is still replicating the log and not yet in ISR.
  • Broker hosting a replica is down or restarting, so that replica is temporarily unavailable.
  • Follower lag pushing a replica out of ISR temporarily.
  • Stale client metadata referencing a replica during a transition.
  • Disk or I/O pressure on a follower broker slowing replication enough that the replica drops out.

How to Reproduce the Error

Trigger a partition reassignment and inspect metadata while it is in flight:

# Observe the current replica placement
kafka-topics.sh --bootstrap-server kafka-1:9092 --describe --topic events-v1
# During an operational reassignment of events-v1's replicas to new brokers,
# clients fetching metadata may briefly see REPLICA_NOT_AVAILABLE for the moving replica.

While a replica is being relocated to a new broker, a metadata request that arrives before the new replica is ready can surface the exception, then clear once the replica catches up and joins ISR.

Diagnostic Commands

Check replica placement, ISR, and which replicas are lagging:

kafka-topics.sh --bootstrap-server kafka-1:9092 --describe --topic events-v1
kafka-topics.sh --bootstrap-server kafka-1:9092 --describe --under-replicated-partitions

Verify all expected brokers are up and serving:

kafka-broker-api-versions.sh --bootstrap-server kafka-1:9092
kafka-metadata-quorum.sh --bootstrap-server kafka-1:9092 describe --status

Look for reassignment and ISR-change activity, and check broker health:

sudo journalctl -u kafka --since "20 min ago" | grep -iE 'reassign|isr|shrink|expand|replica'
grep -iE 'min.insync.replicas|replica.lag.time.max.ms' /opt/kafka/config/server.properties
ss -ltnp | grep -E ':9092'

Step-by-Step Resolution

  1. Confirm a leader and ISR exist. kafka-topics.sh --describe. If the partition has a healthy Leader and Isr meets min.insync.replicas, the exception is informational and your traffic is fine — no action needed.
  2. Check for an active reassignment. If replicas are mid-move (look at Replicas vs Isr and the logs for reassign), wait for it to complete; the exception clears as the new replica joins ISR.
  3. Find under-replicated partitions. --under-replicated-partitions. If the list is non-empty and not shrinking, a replica is genuinely stuck.
  4. Inspect the lagging replica’s broker. If a specific broker’s replicas are not catching up, check that broker with kafka-broker-api-versions.sh and systemctl status kafka, and review disk/IO and journalctl for errors.
  5. Restore any down broker. If the unavailable replica is on an offline broker, bringing it back lets the replica rejoin ISR.
  6. Verify ISR recovery. Re-describe the topic; once the replica is back in Isr and under-replicated count returns to zero, the warnings stop.

Prevention and Best Practices

  • Treat ReplicaNotAvailableException as retriable and benign in client code; do not fail produce/consume paths on it.
  • Run reassignments with throttling during low-traffic windows so followers catch up without saturating the network or disks.
  • Keep replication factor 3 with min.insync.replicas=2 so one unavailable replica never blocks writes.
  • Monitor UnderReplicatedPartitions and ISR shrink/expand rates; alert when under-replication persists rather than on every transient blip.
  • Watch follower broker disk and I/O — replication that can’t keep up is the usual reason a replica lingers out of ISR.
  • Stagger broker restarts and wait for ISR to fully recover between them. The free incident assistant can help judge whether a --describe snapshot is healthy or genuinely degraded.
  • LeaderNotAvailableException — there is no leader at all, a more serious state than a single replica being unavailable.
  • NotLeaderOrFollowerException — the contacted broker is neither leader nor follower for the partition (stale metadata).
  • NotEnoughReplicasException — ISR has dropped below min.insync.replicas and writes with acks=all are now rejected.
  • UnderReplicatedPartitions (metric) — the operational signal that replicas are not fully caught up.

Frequently Asked Questions

Is ReplicaNotAvailableException an error I need to fix? Usually not. It is most often transient and informational, emitted while a replica is being reassigned or catching up. If the partition has a leader and adequate ISR, your traffic is unaffected.

Why do I see it during a reassignment? A replica being moved to a new broker is not yet available there, so metadata responses note it. Once the new replica catches up and joins ISR, the message disappears.

When should I actually worry? When under-replicated partitions persist and do not shrink, or when ISR drops toward min.insync.replicas. That means a replica is genuinely stuck, often due to a down or overloaded broker.

How is this different from NotEnoughReplicasException? ReplicaNotAvailableException flags one unavailable replica and is typically harmless. NotEnoughReplicasException means ISR is too small to satisfy acks=all writes — those writes are actively rejected.

Should clients retry on it? Yes. It is retriable, and modern clients handle it automatically without disrupting produce or consume.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.