Kafka Error Guide: 'NotEnoughReplicasException: Messages are rejected' Fewer In-Sync Replicas Than Required
Resolve Kafka NotEnoughReplicasException and NotEnoughReplicasAfterAppendException: ISR dropped below min.insync.replicas under acks=all. Diagnose ISR and fix durability.
- #kafka
- #troubleshooting
- #errors
- #producer
Exact Error Message
org.apache.kafka.common.errors.NotEnoughReplicasException: Messages are rejected
since there are fewer in-sync replicas than required.
at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:691)
at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:612)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:325)
at java.base/java.lang.Thread.run(Thread.java:840)
# Sibling exception when the append already landed but ISR shrank before ack:
org.apache.kafka.common.errors.NotEnoughReplicasAfterAppendException: Messages are
written to the log, but to fewer in-sync replicas than required.
# Broker server.log shrinking the ISR just before the rejection:
[2026-06-29 09:15:42,331] INFO [Partition orders-3 broker=1] Shrinking ISR from
1,2,3 to 1 (kafka.cluster.Partition)
What the Error Means
This error is a durability guard, not a transport failure. When a producer uses acks=all, the leader will not acknowledge a write until every replica in the in-sync replica set (ISR) has the data. The topic-level min.insync.replicas defines the floor: if the current ISR size is below that floor, the leader refuses the write with NotEnoughReplicasException. Kafka would rather reject your data than acknowledge a write it cannot durably guarantee.
There are two flavors:
NotEnoughReplicasExceptionfires before the append. The leader checks ISR size up front, sees it is too small, and rejects without writing anything.NotEnoughReplicasAfterAppendExceptionfires after the local append. The leader wrote the record to its own log, but the ISR shrank below the floor before enough followers acknowledged, so it cannot confirm durability. This one is subtle: the record may physically exist on the leader yet still be reported as failed.
Both are retriable — once the ISR recovers, retried sends succeed.
Common Causes
- A replica broker is down. With
replication.factor=3andmin.insync.replicas=2, losing one broker still leaves ISR=2 and you are fine. Losing two takes ISR to 1, below the floor, and writes fail. - A follower fell out of the ISR due to lag. A slow or GC-stalled follower that stops fetching for longer than
replica.lag.time.max.msis removed from the ISR by the leader. The partition is still online but no longer meets the durability floor. replication.factortoo low for the policy. A topic created withreplication.factor=1can never satisfymin.insync.replicas=2— the very first follower-less write fails. This is a configuration contradiction, not a transient fault.- Cluster-wide
min.insync.replicasraised without enough replicas to back it. Bumping the broker default to 2 instantly breaks every RF=1 topic. - Rolling restart or maintenance that takes brokers down faster than ISR can re-stabilize.
How to Reproduce the Error
Create a topic with replication.factor=3 and min.insync.replicas=2, then take two of the three replica brokers offline and produce with acks=all.
Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("acks", "all"); // require full ISR acknowledgement
p.put("retries", 0); // fail fast so we see the exception
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
try (Producer<String, String> producer = new KafkaProducer<>(p)) {
// With ISR < min.insync.replicas this throws NotEnoughReplicasException
producer.send(new ProducerRecord<>("orders", "k", "v")).get();
}
With acks=1 or acks=0 the same scenario would not throw — which is exactly why this error only appears in durability-sensitive pipelines.
Diagnostic Commands
The decisive signal is the gap between Replicas: and Isr: in the topic description. These are read-only.
# Compare Replicas vs Isr. An Isr list shorter than min.insync.replicas is your problem.
kafka-topics.sh --bootstrap-server broker:9092 --describe --topic orders
# List only under-replicated partitions across the cluster
kafka-topics.sh --bootstrap-server broker:9092 --describe --under-replicated-partitions
# Confirm the topic's min.insync.replicas floor
kafka-configs.sh --bootstrap-server broker:9092 \
--entity-type topics --entity-name orders --describe
# Check the broker default min.insync.replicas and unclean.leader.election.enable
kafka-configs.sh --bootstrap-server broker:9092 \
--entity-type brokers --entity-name 1 --describe
# Watch the leader shrink/expand the ISR in real time
journalctl -u kafka -f | grep -i "Shrinking ISR\|Expanding ISR"
In --describe output, a healthy partition shows Replicas: 1,2,3 Isr: 1,2,3. A failing one shows something like Replicas: 1,2,3 Isr: 1 — only one in-sync replica against a floor of two.
Step-by-Step Resolution
-
Quantify the gap. Run
kafka-topics.sh --describeand readIsr:versusReplicas:. IfIsrcount is belowmin.insync.replicas, that is the exact cause.--under-replicated-partitionsshows the cluster-wide blast radius. -
If a broker is simply down, the resolution is operational: bring it back. As it catches up, the leader logs
Expanding ISRand writes resume. This is the common transient case and needs no config change. -
If a follower is lagging but alive, find out why — disk saturation, network, or long GC pauses on that broker. The follower rejoins the ISR automatically once it catches up within
replica.lag.time.max.ms. Do not lower this value to “fix” the symptom; that just admits slow replicas and weakens durability. -
If
replication.factoris fundamentally too low (e.g. RF=1 withmin.insync.replicas=2), the topic can never satisfy the policy. The durable target configuration is:replication.factor=3 min.insync.replicas=2 # producer side acks=allIncreasing replication factor is a partition-reassignment operation performed during a maintenance window — it is the correct fix, not a producer setting.
-
Keep
unclean.leader.election.enable=false. Enabling it would let an out-of-sync replica become leader and silently drop acknowledged data — it papers overNotEnoughReplicasExceptionby sacrificing the very durability the error protects. -
Tune the producer to ride out transient ISR dips by allowing retries (see below) rather than failing fast.
Prevention and Best Practices
- Standardize on
replication.factor=3andmin.insync.replicas=2for durable topics. This tolerates one broker loss while still rejecting unsafe writes — the sweet spot. - Set producer
retrieshigh (orInteger.MAX_VALUE) with a sanedelivery.timeout.ms, so a brief ISR shrink during a rolling restart is retried transparently instead of surfacing as a hard failure. - Enable
enable.idempotence=trueso those retries cannot create duplicates. - Orchestrate rolling restarts one broker at a time and wait for
--under-replicated-partitionsto return empty before moving to the next broker. - Alert on under-replicated partitions and on ISR shrink events; they are leading indicators of
NotEnoughReplicasExceptionbefore producers ever see it. An incident assistant can tie an ISR-shrink log line to the producer exceptions it caused. - Never enable unclean leader election on data you care about.
Related Errors
NotEnoughReplicasAfterAppendException— the post-append sibling covered above; same root cause, different timing.TimeoutException— whenacks=allwrites stall because followers are slow, you may hit a delivery timeout before the ISR check even resolves; the two often appear together during broker instability.NetworkException— a broker disconnect during a rolling restart can both drop connections (NetworkException) and shrink the ISR (NotEnoughReplicasException) at the same moment.UnknownProducerIdException— relevant when idempotent retries interact with replica/leadership churn.
See the full Kafka guides for the surrounding durability topics.
Frequently Asked Questions
What is the difference between NotEnoughReplicasException and NotEnoughReplicasAfterAppendException?
The first is thrown before any write — the leader sees the ISR is below min.insync.replicas and refuses. The second is thrown after the record was appended to the leader’s local log but the ISR shrank before enough followers confirmed, so durability cannot be guaranteed. With idempotence enabled, retrying the latter is safe and will not duplicate.
Will lowering min.insync.replicas to 1 make the error go away?
It will, but it defeats the purpose. With min.insync.replicas=1 an acks=all write is acknowledged by the leader alone, so a leader failure immediately after the ack loses the data. Treat that only as a deliberate, temporary availability-over-durability trade-off, never a permanent fix.
Why do I only see this with acks=all and never with acks=1?
The ISR floor is only enforced when the producer asks for full acknowledgement. With acks=1 the leader acks as soon as it writes locally, ignoring ISR size, so the durability guard never triggers — at the cost of weaker guarantees.
How do I read the ISR from kafka-topics.sh?
In --describe output each partition line shows Replicas: (the assigned set) and Isr: (those currently in sync). If the Isr list is shorter than min.insync.replicas, writes with acks=all will be rejected. A healthy partition has identical Replicas and Isr lists.
Should I enable unclean leader election to avoid downtime?
No, not for durable data. Unclean leader election lets a replica that was not in the ISR become leader, which can discard already-acknowledged records. It trades the controlled failure of NotEnoughReplicasException for silent data loss. Keep unclean.leader.election.enable=false.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.