Kafka Error Guide: 'NotEnoughReplicasException: Messages are

Exact Error Message

org.apache.kafka.common.errors.NotEnoughReplicasException: Messages are rejected
  since there are fewer in-sync replicas than required.
	at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:691)
	at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:612)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:325)
	at java.base/java.lang.Thread.run(Thread.java:840)

# Sibling exception when the append already landed but ISR shrank before ack:
org.apache.kafka.common.errors.NotEnoughReplicasAfterAppendException: Messages are
  written to the log, but to fewer in-sync replicas than required.

# Broker server.log shrinking the ISR just before the rejection:
[2026-06-29 09:15:42,331] INFO [Partition orders-3 broker=1] Shrinking ISR from
  1,2,3 to 1 (kafka.cluster.Partition)

What the Error Means

This error is a durability guard, not a transport failure. When a producer uses acks=all, the leader will not acknowledge a write until every replica in the in-sync replica set (ISR) has the data. The topic-level min.insync.replicas defines the floor: if the current ISR size is below that floor, the leader refuses the write with NotEnoughReplicasException. Kafka would rather reject your data than acknowledge a write it cannot durably guarantee.

There are two flavors:

NotEnoughReplicasException fires before the append. The leader checks ISR size up front, sees it is too small, and rejects without writing anything.
NotEnoughReplicasAfterAppendException fires after the local append. The leader wrote the record to its own log, but the ISR shrank below the floor before enough followers acknowledged, so it cannot confirm durability. This one is subtle: the record may physically exist on the leader yet still be reported as failed.

Both are retriable — once the ISR recovers, retried sends succeed.

Common Causes

A replica broker is down. With replication.factor=3 and min.insync.replicas=2, losing one broker still leaves ISR=2 and you are fine. Losing two takes ISR to 1, below the floor, and writes fail.
A follower fell out of the ISR due to lag. A slow or GC-stalled follower that stops fetching for longer than replica.lag.time.max.ms is removed from the ISR by the leader. The partition is still online but no longer meets the durability floor.
replication.factor too low for the policy. A topic created with replication.factor=1 can never satisfy min.insync.replicas=2 — the very first follower-less write fails. This is a configuration contradiction, not a transient fault.
Cluster-wide min.insync.replicas raised without enough replicas to back it. Bumping the broker default to 2 instantly breaks every RF=1 topic.
Rolling restart or maintenance that takes brokers down faster than ISR can re-stabilize.

How to Reproduce the Error

Create a topic with replication.factor=3 and min.insync.replicas=2, then take two of the three replica brokers offline and produce with acks=all.

Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("acks", "all");                 // require full ISR acknowledgement
p.put("retries", 0);                  // fail fast so we see the exception
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

try (Producer<String, String> producer = new KafkaProducer<>(p)) {
    // With ISR < min.insync.replicas this throws NotEnoughReplicasException
    producer.send(new ProducerRecord<>("orders", "k", "v")).get();
}

With acks=1 or acks=0 the same scenario would not throw — which is exactly why this error only appears in durability-sensitive pipelines.

Diagnostic Commands

The decisive signal is the gap between Replicas: and Isr: in the topic description. These are read-only.

# Compare Replicas vs Isr. An Isr list shorter than min.insync.replicas is your problem.
kafka-topics.sh --bootstrap-server broker:9092 --describe --topic orders

# List only under-replicated partitions across the cluster
kafka-topics.sh --bootstrap-server broker:9092 --describe --under-replicated-partitions

# Confirm the topic's min.insync.replicas floor
kafka-configs.sh --bootstrap-server broker:9092 \
  --entity-type topics --entity-name orders --describe

# Check the broker default min.insync.replicas and unclean.leader.election.enable
kafka-configs.sh --bootstrap-server broker:9092 \
  --entity-type brokers --entity-name 1 --describe

# Watch the leader shrink/expand the ISR in real time
journalctl -u kafka -f | grep -i "Shrinking ISR\|Expanding ISR"

In --describe output, a healthy partition shows Replicas: 1,2,3 Isr: 1,2,3. A failing one shows something like Replicas: 1,2,3 Isr: 1 — only one in-sync replica against a floor of two.

Step-by-Step Resolution

Quantify the gap. Run kafka-topics.sh --describe and read Isr: versus Replicas:. If Isr count is below min.insync.replicas, that is the exact cause. --under-replicated-partitions shows the cluster-wide blast radius.
If a broker is simply down, the resolution is operational: bring it back. As it catches up, the leader logs Expanding ISR and writes resume. This is the common transient case and needs no config change.
If a follower is lagging but alive, find out why — disk saturation, network, or long GC pauses on that broker. The follower rejoins the ISR automatically once it catches up within replica.lag.time.max.ms. Do not lower this value to “fix” the symptom; that just admits slow replicas and weakens durability.
If replication.factor is fundamentally too low (e.g. RF=1 with min.insync.replicas=2), the topic can never satisfy the policy. The durable target configuration is:
```
replication.factor=3
min.insync.replicas=2
# producer side
acks=all
```
Increasing replication factor is a partition-reassignment operation performed during a maintenance window — it is the correct fix, not a producer setting.
Keep unclean.leader.election.enable=false. Enabling it would let an out-of-sync replica become leader and silently drop acknowledged data — it papers over NotEnoughReplicasException by sacrificing the very durability the error protects.
Tune the producer to ride out transient ISR dips by allowing retries (see below) rather than failing fast.

Prevention and Best Practices

Standardize on replication.factor=3 and min.insync.replicas=2 for durable topics. This tolerates one broker loss while still rejecting unsafe writes — the sweet spot.
Set producer retries high (or Integer.MAX_VALUE) with a sane delivery.timeout.ms, so a brief ISR shrink during a rolling restart is retried transparently instead of surfacing as a hard failure.
Enable enable.idempotence=true so those retries cannot create duplicates.
Orchestrate rolling restarts one broker at a time and wait for --under-replicated-partitions to return empty before moving to the next broker.
Alert on under-replicated partitions and on ISR shrink events; they are leading indicators of NotEnoughReplicasException before producers ever see it. An incident assistant can tie an ISR-shrink log line to the producer exceptions it caused.
Never enable unclean leader election on data you care about.

NotEnoughReplicasAfterAppendException — the post-append sibling covered above; same root cause, different timing.
TimeoutException — when acks=all writes stall because followers are slow, you may hit a delivery timeout before the ISR check even resolves; the two often appear together during broker instability.
NetworkException — a broker disconnect during a rolling restart can both drop connections (NetworkException) and shrink the ISR (NotEnoughReplicasException) at the same moment.
UnknownProducerIdException — relevant when idempotent retries interact with replica/leadership churn.

See the full Kafka guides for the surrounding durability topics.

Frequently Asked Questions

What is the difference between NotEnoughReplicasException and NotEnoughReplicasAfterAppendException? The first is thrown before any write — the leader sees the ISR is below min.insync.replicas and refuses. The second is thrown after the record was appended to the leader’s local log but the ISR shrank before enough followers confirmed, so durability cannot be guaranteed. With idempotence enabled, retrying the latter is safe and will not duplicate.

Will lowering min.insync.replicas to 1 make the error go away? It will, but it defeats the purpose. With min.insync.replicas=1 an acks=all write is acknowledged by the leader alone, so a leader failure immediately after the ack loses the data. Treat that only as a deliberate, temporary availability-over-durability trade-off, never a permanent fix.

Why do I only see this with acks=all and never with acks=1? The ISR floor is only enforced when the producer asks for full acknowledgement. With acks=1 the leader acks as soon as it writes locally, ignoring ISR size, so the durability guard never triggers — at the cost of weaker guarantees.

How do I read the ISR from kafka-topics.sh? In --describe output each partition line shows Replicas: (the assigned set) and Isr: (those currently in sync). If the Isr list is shorter than min.insync.replicas, writes with acks=all will be rejected. A healthy partition has identical Replicas and Isr lists.

Should I enable unclean leader election to avoid downtime? No, not for durable data. Unclean leader election lets a replica that was not in the ISR become leader, which can discard already-acknowledged records. It trades the controlled failure of NotEnoughReplicasException for silent data loss. Keep unclean.leader.election.enable=false.

Kafka Error Guide: 'NotEnoughReplicasException: Messages are rejected' Fewer In-Sync Replicas Than Required

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit