Kafka Error Guide: 'TimeoutException: Expiring N record(s)'

Exact Error Message

This exception is delivered to your producer’s send() callback (or thrown from a blocking future.get()) when a batch cannot be acknowledged inside its allotted time budget:

org.apache.kafka.common.errors.TimeoutException: Expiring 5 record(s) for orders-0:30000 ms has passed since batch creation
	at org.apache.kafka.clients.producer.internals.ProducerBatch.completeExceptionally(ProducerBatch.java:...)
	at org.apache.kafka.clients.producer.internals.RecordAccumulator.expireBatches(RecordAccumulator.java:...)
	at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:...)
	at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:...)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:...)
	at java.base/java.lang.Thread.run(Thread.java:840)

In application logs it usually surfaces through the asynchronous callback rather than on the calling thread:

[2026-06-29 14:02:11,883] ERROR Failed to deliver record to orders (com.acme.OrderProducer)
org.apache.kafka.common.errors.TimeoutException: Expiring 5 record(s) for orders-0:30000 ms has passed since batch creation

A closely related variant is raised by the broker rather than the client: Produce request timed out (mapped to the broker-side REQUEST_TIMED_OUT error code) which the producer logs as a retriable TimeoutException for the specific in-flight request. The 30000 ms value is the elapsed time against delivery.timeout.ms, and orders-0 identifies the topic-partition whose batch expired.

What the Error Means

Every record you call send() on is appended to an in-memory RecordAccumulator batch keyed by topic-partition. From the moment that batch is created, a clock starts ticking against delivery.timeout.ms (default 120000 ms). If the batch has not been acknowledged by all required replicas before that deadline, the producer gives up, removes the batch from the accumulator, and completes its callback exceptionally with TimeoutException.

The message wording is precise: “Expiring 5 record(s)” means five records were sitting in that one batch, and “30000 ms has passed since batch creation” means delivery.timeout.ms was configured to 30000. This is an end-to-end deadline that spans three sub-phases: time waiting in the accumulator (governed by linger.ms and batch.size), time blocked because buffer.memory is exhausted, and time spent in flight waiting for a broker response (governed by request.timeout.ms plus retries/retry.backoff.ms). Any phase that stalls can blow the overall budget.

Crucially, expiration can happen before the record ever leaves the JVM. If the leader for orders-0 is unavailable, metadata is stale, or the broker is simply slow, the batch never gets sent and times out purely on the client side.

Common Causes

delivery.timeout.ms set too low. Lowering it to 30000 (as in the message) leaves little headroom once you account for request.timeout.ms plus retries. The constraint delivery.timeout.ms >= linger.ms + request.timeout.ms must hold, and the producer will refuse to start if it does not.
Broker unreachable or leader unavailable. A partition whose leader is down (or mid-election) has no destination, so batches accumulate and expire. Look for NOT_LEADER_OR_FOLLOWER and metadata refreshes preceding the timeout.
Slow broker / overloaded cluster. High request-handler queue depth, slow disks, or acks=all with a lagging follower makes each produce request slow enough that retries exhaust the budget.
buffer.memory exhausted. When the 32 MB default buffer fills (downstream is slower than your produce rate), send() blocks for up to max.block.ms, eating into the delivery deadline before the batch is even queued.
Network saturation or throttling. Quota throttling (producer_byte_rate) or a saturated NIC delays responses, tripping request.timeout.ms repeatedly.
Large batches with aggressive linger. A high linger.ms combined with small throughput means batches sit waiting and can age out under a tight delivery timeout.

How to Reproduce the Error

Point a producer at a partition whose leader you can make unreachable, and set a deliberately tight delivery budget:

bootstrap.servers=kafka-1.internal:9092
acks=all
delivery.timeout.ms=30000
request.timeout.ms=10000
linger.ms=5
batch.size=16384
buffer.memory=33554432
max.block.ms=60000

Properties props = loadProducerProps();
try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
    for (int i = 0; i < 5; i++) {
        producer.send(new ProducerRecord<>("orders", "key-" + i, "payload-" + i),
            (md, ex) -> { if (ex != null) ex.printStackTrace(); });
    }
    producer.flush(); // callbacks fire with TimeoutException once 30s elapses
}

Then stop the broker hosting the orders-0 leader (or block port 9092 with a firewall rule). With no leader to accept the batch, the five buffered records expire at the 30-second mark.

Diagnostic Commands

Confirm the partition has a live leader and healthy ISR:

kafka-topics.sh --bootstrap-server kafka-1.internal:9092 --describe --topic orders

Inspect the effective producer-relevant broker config (request handler threads, message size limits):

kafka-configs.sh --bootstrap-server kafka-1.internal:9092 --describe --entity-type brokers --entity-name 1

Verify broker API versions are reachable (a clean response proves basic connectivity):

kafka-broker-api-versions.sh --bootstrap-server kafka-1.internal:9092

Scan client logs for the sequence of metadata refreshes and disconnects that precede expiration:

grep -E "Expiring|NOT_LEADER|disconnected|metadata" /var/log/orders-producer/app.log

Check the broker’s own log for slow-request or replication warnings around the timestamp:

journalctl -u kafka --since "2026-06-29 14:00:00" --until "2026-06-29 14:05:00" | grep -iE "slow|timed out|isr"

Step-by-Step Resolution

Re-establish a healthy leader first. If --describe shows Leader: -1 or a shrunken ISR, the timeout is a symptom of an unavailable partition. Restore the broker or wait out the election before tuning anything.
Give the delivery budget realistic headroom. Raise delivery.timeout.ms back toward the default and keep the invariant delivery.timeout.ms >= linger.ms + request.timeout.ms. A sane starting point:
```
delivery.timeout.ms=120000
request.timeout.ms=30000
linger.ms=20
```
Right-size batching. Increase batch.size (e.g. 65536) and a modest linger.ms so batches fill efficiently rather than dribbling out; this reduces request count and per-record overhead under load.
Relieve buffer pressure. If logs show send() blocking, raise buffer.memory or reduce produce rate. Watch the buffer-available-bytes and bufferpool-wait-ratio JMX metrics.
Tune for slow brokers, not against them. Keep retries high (the default is effectively Integer.MAX_VALUE) and let delivery.timeout.ms be the real cap. With enable.idempotence=true, retries stay safe and ordered.
Reconsider acks=all cost. If min.insync.replicas plus a lagging follower is the bottleneck, fix replication health rather than weakening durability.

Prevention and Best Practices

Treat delivery.timeout.ms as your single end-to-end SLA knob and derive the others from it; never set it below request.timeout.ms.
Alert on the producer JMX metrics record-error-rate, request-latency-avg, and bufferpool-wait-ratio so you see pressure before batches expire.
Keep enable.idempotence=true so aggressive retries do not introduce duplicates or reordering.
Capacity-plan brokers for acks=all: a slow follower silently inflates produce latency.
When timeouts spike during an incident, route the stacktrace and broker state into the incident assistant for a guided root-cause walkthrough, and keep the rest of our Kafka guides handy for replication and connectivity follow-ups.

NetworkException — transient connection drops to the leader often manifest as repeated retries that ultimately roll up into this delivery TimeoutException.
NotEnoughReplicasException — when acks=all and ISR falls below min.insync.replicas, produce requests stall and can age out, so you may see both errors together.
RecordTooLargeException — a different failure mode (rejected immediately, not on a timer), but worth ruling out when only some records fail.

Frequently Asked Questions

Does increasing request.timeout.ms fix this? Only partially. request.timeout.ms caps a single in-flight produce request; delivery.timeout.ms caps the whole lifecycle including queue time and retries. If batches expire while waiting for a leader, raising request.timeout.ms alone changes nothing — raise the delivery budget instead.

Why do I see “since batch creation” when the record never left my app? The clock starts when the batch is created in the accumulator, not when it is sent. A record can expire purely client-side if there is no available leader to send it to, which is exactly why this error often points to broker or metadata problems.

Can linger.ms cause timeouts? Indirectly. A large linger.ms makes batches wait longer before sending, consuming part of the delivery budget. Under low throughput with a tight delivery.timeout.ms, that wait can be the difference between success and expiration.

Will retries make this worse? No — retries are bounded by delivery.timeout.ms, not by the retries count. With idempotence enabled, generous retries are safe and are usually the right way to survive a slow or briefly unavailable broker.

Is this error retriable from my code? This specific expiration is terminal for those records; they were dropped from the buffer. You must re-send() them yourself. Design your callback to re-enqueue or dead-letter expired records rather than assuming the client will retry them.

Kafka Error Guide: 'TimeoutException: Expiring N record(s)' Producer Send Timeout

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit