Kafka Error Guide: 'UnknownProducerIdException' Producer ID Not Found by Broker
Fix Kafka UnknownProducerIdException: diagnose idempotent producer ID state evicted by short retention.ms, producer.id.expiration.ms expiry, and old-broker restarts; tune retention and adopt KIP-360.
- #kafka
- #troubleshooting
- #errors
- #producer
Exact Error Message
org.apache.kafka.common.errors.UnknownProducerIdException: This exception is raised by the broker if it could not locate the producer metadata associated with the producerId in question.
In producer logs it typically surfaces as a fatal error that aborts the send, often wrapped by the sender thread:
[2026-06-29 06:42:17,553] ERROR [Producer clientId=order-ingest-7, transactionalId=null] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender)
org.apache.kafka.common.errors.UnknownProducerIdException: This exception is raised by the broker if it could not locate the producer metadata associated with the producerId in question.
[2026-06-29 06:42:17,556] WARN [Producer clientId=order-ingest-7] Got error produce response in correlation id 90183 on topic-partition events-12, splitting and retrying (1 attempts left). Error: UNKNOWN_PRODUCER_ID
What the Error Means
When you enable idempotence (enable.idempotence=true, the default since Kafka 3.0), the broker assigns each producer a unique producer ID (PID) and tracks, per partition, the last sequence number it accepted from that PID. This per-partition (producerId, epoch, sequence) state is what lets the broker deduplicate retried batches and guarantee exactly-once semantics within a session. The broker persists this state alongside the partition log in a producer-state snapshot.
UnknownProducerIdException means the producer sent a batch tagged with a PID, but the broker has no record of that PID for that partition. From the broker’s perspective the producer is a stranger: it cannot verify the sequence number, so it cannot safely deduplicate, and it refuses the write. The most common reason the metadata is gone is that the broker intentionally expired it. Producer state is not kept forever. If every record a PID ever wrote to a partition has aged out of the log (because of retention), or if the PID has been idle longer than the producer-ID expiration window, the broker discards the state to bound memory. When the producer later tries to continue the same session, its PID is no longer recognized.
Common Causes
- Retention shorter than producer activity gaps: this is the classic cause on older brokers. If
retention.msis low and a producer writes infrequently, all of a PID’s records can be deleted from the log before the next write. With no surviving records, the broker drops the producer state and the next send fails with UNKNOWN_PRODUCER_ID. producer.id.expiration.mselapsed: the broker evicts idle PID state after this window (default 24 hours / 86400000 ms). A producer that goes quiet longer than this and then resumes finds its PID expired.transactional.id.expiration.mselapsed for transactional producers: if the transactional ID’s metadata expires (default 7 days), the coordinator forgets the mapping and the producer’s PID is no longer valid.- Broker restart on older brokers losing PID state: pre-KIP-360 brokers handled lost producer state by raising a fatal
UnknownProducerIdExceptionthat the client could not recover from, forcing an application restart. - Aggressive log compaction or truncation that removes the tail records a PID was anchored to, on a compacted topic.
How to Reproduce the Error
The reliable way to reproduce on an older broker is to combine very short retention with an idle, idempotent producer. Configure a test topic with retention measured in seconds, send one record, wait for the segment to roll and be deleted, then send again on the same producer instance:
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
try (Producer<String, String> producer = new KafkaProducer<>(props)) {
producer.send(new ProducerRecord<>("short-retention", "k", "first")).get();
// Topic retention.ms is a few seconds; all records for this PID expire.
Thread.sleep(120_000);
// The PID's state has been swept; this send fails on old brokers.
producer.send(new ProducerRecord<>("short-retention", "k", "second")).get();
}
On Kafka 2.5+ with KIP-360, the same scenario causes the client to transparently reset the PID rather than fail fatally, so reproduction of the fatal form requires an older broker/client combination.
Diagnostic Commands
Check the topic’s retention and cleanup policy, the usual culprit:
kafka-configs.sh --bootstrap-server broker:9092 --describe \
--entity-type topics --entity-name events
kafka-topics.sh --bootstrap-server broker:9092 --describe --topic events
Inspect broker-level producer-ID and transaction expiration settings:
kafka-configs.sh --bootstrap-server broker:9092 --describe \
--entity-type brokers --entity-name 1 --all | \
grep -E "producer.id.expiration|transactional.id.expiration|transaction.timeout"
Confirm broker and client API versions to know whether KIP-360 recovery applies:
kafka-broker-api-versions.sh --bootstrap-server broker:9092 | head -20
Pull the error and PID context from the logs:
grep -i "UNKNOWN_PRODUCER_ID\|UnknownProducerId\|expired producer" \
/var/log/kafka/server.log
journalctl -u kafka --since "2 hours ago" | grep -i producer
Step-by-Step Resolution
- Confirm the broker and client versions. If both are recent (Kafka 2.5+ broker, 2.5+ client), KIP-360 lets the client recover by resetting the PID automatically; an occasional logged UNKNOWN_PRODUCER_ID followed by a successful retry is benign. The fatal, application-killing form means at least one side is old.
- Upgrade old brokers and clients. The durable fix is moving to a KIP-360-aware version on both ends so a missing PID becomes a recoverable reset instead of a fatal abort. This single change eliminates most production occurrences.
- Align retention with producer cadence. If a producer writes to a partition less often than the data is retained, raise retention so a PID always has at least one surviving record. For a slow but important stream, target a retention comfortably longer than the maximum idle gap:
retention.ms=604800000
cleanup.policy=delete
- Tune producer-ID expiration if you run long-idle idempotent producers. The broker default is 24 hours; widen it (within memory budget) so idle producers are not evicted between bursts:
producer.id.expiration.ms=259200000
- For transactional producers, ensure
transactional.id.expiration.msexceeds your longest expected gap between transactions, and keeptransaction.timeout.msbelow it so individual transactions still time out promptly. - As an application fallback on legacy stacks, catch the fatal exception, close the producer, and recreate it to obtain a fresh PID. This sacrifices the idempotence guarantee across the boundary but keeps the service running until you can upgrade.
Prevention and Best Practices
Treat retention.ms and producer write cadence as a coupled pair: any idempotent producer that can be idle longer than its data lives is a latent UNKNOWN_PRODUCER_ID. Standardize on KIP-360-capable broker and client versions across the fleet so the failure mode degrades to a transparent reset. For transactional/EOS workloads, keep the ordering transaction.timeout.ms < transactional.id.expiration.ms and avoid setting either so low that normal pauses trip expiry. Monitor the broker metric for expired producer IDs and alert when it climbs, since a rising rate predicts client-visible failures. Keep idempotent producers long-lived rather than creating one per message; short-lived producers churn PIDs and waste broker state. The incident assistant can cross-reference your retention settings against producer idle patterns when this fires in production.
Related Errors
UnknownProducerIdException is closely tied to OutOfOrderSequenceException: both stem from the broker’s per-PID sequence tracking. Where this error means the PID is gone, an out-of-order sequence means the PID is known but a batch arrived with an unexpected sequence number, often after PID expiry resets the counter. Idempotent producers that hit retention or expiry problems may also surface TimeoutException during retries. If PID state is lost due to underlying segment corruption, you may see CorruptRecordException on the same partition. Browse the complete Kafka guides for these.
Frequently Asked Questions
Is UnknownProducerIdException recoverable?
It depends on versions. On Kafka 2.5+ with KIP-360 on both broker and client, the client transparently resets the producer ID and retries, so it is effectively recoverable. On older versions it is fatal and aborts the producer, requiring a restart or manual producer recreation.
Why does this happen to an idle producer but not a busy one?
A busy producer keeps writing records, so its PID always has surviving data in the log and is refreshed before producer.id.expiration.ms elapses. An idle producer can outlast both its data’s retention and the expiration window, after which the broker discards its state.
What is the difference between producer.id.expiration.ms and retention.ms?
retention.ms controls how long records live in the log; producer.id.expiration.ms controls how long idle PID metadata is kept (default 24h). Either one elapsing can drop the PID state. They are independent, so check both.
Will increasing retention fix it permanently?
It fixes the retention-driven case and is the right move for slow producers, but it does not address PID expiration from idleness or the fatal behavior of old brokers. The complete fix is raising retention and running KIP-360-capable versions.
Does this affect data already written?
No. Records the broker already accepted are durable. The exception only blocks the current send whose PID can no longer be validated. You lose the guarantee of deduplicating that in-flight batch, not previously committed data.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.