Kafka Exactly-Once Semantics Explained

“Exactly-once” is the most misunderstood phrase in the Kafka vocabulary. People hear it and imagine some magical guarantee that a message is physically delivered one time across any system boundary, which is impossible. What Kafka actually provides is exactly-once processing semantics: within a Kafka-to-Kafka pipeline, the effect of processing each record is reflected exactly once, even across retries and failures. That is a narrower and far more useful claim. It is built from two mechanisms — the idempotent producer and transactions — combined in the read-process-write pattern. This guide explains how each piece works on Kafka 3.x, how to configure it, and where the guarantee starts and stops.

Delivery semantics: the three options

Before exactly-once, you need the other two, because they define the failure modes it solves.

At-most-once: The producer fires and forgets, or commits offsets before processing. If a failure occurs, messages can be lost but never duplicated. Lowest latency, weakest guarantee.
At-least-once: The producer retries until it gets an acknowledgment, and consumers commit offsets after processing. Nothing is lost, but a retry after a lost ack produces a duplicate. This is the default behavior most people actually run.
Exactly-once: Combines no-loss with no-duplicates. Each record’s effect is applied once. Achieved through idempotence plus transactions.

The thing to internalize is that at-least-once is the natural state of a reliable distributed system. Retries are mandatory for durability, and retries cause duplicates. Exactly-once is the machinery that lets you keep the retries while removing the duplicates.

The idempotent producer

The first building block is the idempotent producer, and on modern Kafka clients it is on by default. Its job is to eliminate duplicates caused by producer retries to a single partition.

Here is the duplicate it prevents. A producer sends a record, the broker writes it and replicates it, but the acknowledgment is lost on the way back. The producer, seeing no ack, retries. Without idempotence the broker writes the record a second time. With idempotence, it does not.

The mechanism is a producer ID (PID) and per-partition sequence numbers. When idempotence is enabled, the broker assigns the producer a PID, and every record carries a monotonically increasing sequence number per partition. The broker tracks the last sequence it accepted. If a retry arrives with a sequence number it has already seen, the broker acknowledges it but does not write it again. A gap in sequence numbers triggers an OutOfOrderSequenceException, surfacing a real problem rather than silently corrupting the log.

Enabling it is a single setting, and it pulls in the right companions automatically:

enable.idempotence=true
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5

When enable.idempotence=true, Kafka requires acks=all (every in-sync replica must acknowledge), unlimited retries, and max.in.flight.requests.per.connection of at most 5. If you set conflicting values explicitly, the producer throws a ConfigException at startup rather than silently weakening the guarantee.

Pro Tip: Idempotence only deduplicates within a single producer session and a single partition. If your producer restarts, it gets a new PID, and the broker has no memory linking the old session to the new one. Idempotence prevents retry duplicates; it does not prevent application-level resends after a crash. That gap is exactly what transactions close.

Transactions: atomic writes across partitions

The idempotent producer solves duplicates from retries. Transactions solve a bigger problem: writing to multiple partitions (and committing consumer offsets) atomically, so that either everything in a unit of work becomes visible together or none of it does.

A transactional producer is configured with a stable, unique transactional.id. This ID is the key to crash recovery — it lets a restarting producer reclaim its identity and fence out any zombie instance of itself still running with the old session.

enable.idempotence=true
transactional.id=order-processor-1

The producer API then wraps work in explicit transaction boundaries:

producer.initTransactions();
try {
    producer.beginTransaction();
    producer.send(new ProducerRecord<>("enriched-orders", key, value));
    producer.send(new ProducerRecord<>("audit-log", key, audit));
    // commit consumer offsets as part of the same transaction (see below)
    producer.commitTransaction();
} catch (ProducerFencedException e) {
    // a newer instance with the same transactional.id took over
    producer.close();
} catch (KafkaException e) {
    producer.abortTransaction();
}

Two records to two different topics either both become visible or both vanish. The atomicity is real: the transaction coordinator on the broker side writes transaction markers to each affected partition, and only committed records are exposed to downstream consumers configured for it.

That consumer configuration is the other half. A consumer only respects transaction boundaries if you tell it to:

isolation.level=read_committed

With read_committed, the consumer skips records from aborted transactions and does not read records from in-flight transactions until they commit. The default, read_uncommitted, reads everything regardless of transaction outcome — which silently defeats the entire point. Setting isolation.level=read_committed on the consumer is the most commonly forgotten step in an exactly-once setup.

Pro Tip: The transactional.id must be stable across restarts and unique per logical producer. If two running processes share the same transactional.id, one will be fenced with a ProducerFencedException — which is the system working correctly, not a bug. In a partitioned application, derive the ID deterministically from the input partition assignment so each worker keeps a consistent identity.

The read-process-write pattern

Exactly-once truly shines in the read-process-write pattern: consume from a topic, transform, and produce to another topic. This is the backbone of every stream processing pipeline, and it is where duplicates do the most damage.

The danger in a naive implementation is the gap between producing output and committing the input offset. If you produce the output, then crash before committing the consumer offset, on restart you reprocess the input and produce the output again — a duplicate. If you commit the offset first and crash before producing, you lose the output. There is no ordering of two separate operations that avoids both.

The solution is to make offset commits part of the producer transaction. Kafka exposes exactly this through sendOffsetsToTransaction:

producer.beginTransaction();
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(200));

Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
for (ConsumerRecord<String, String> record : records) {
    String result = transform(record.value());
    producer.send(new ProducerRecord<>("output-topic", record.key(), result));
    offsets.put(
        new TopicPartition(record.topic(), record.partition()),
        new OffsetAndMetadata(record.offset() + 1));
}
producer.sendOffsetsToTransaction(offsets, consumer.groupMetadata());
producer.commitTransaction();

Now the output records and the input offset commit are in one atomic transaction. Either both are durable or neither is. On a crash, the input offset is exactly where the last committed output ended, so reprocessing produces no duplicates and loses no data. That is exactly-once processing.

If you use Kafka Streams, all of this is handled for you with a single setting:

processing.guarantee=exactly_once_v2

exactly_once_v2 is the modern, more efficient implementation that uses a single producer per instance rather than one per task, dramatically reducing the overhead that made the original implementation costly at scale. For any greenfield Streams application, this is the setting to use.

What exactly-once does not cover

This is where careful teams stay out of trouble. The guarantee is bounded, and the boundaries are sharp.

It is Kafka-to-Kafka. Exactly-once semantics apply when both the source of truth for offsets and the destination are Kafka. The transaction coordinates Kafka writes and Kafka offset commits atomically.
External side effects break it. If your processing writes to an external database, sends an email, or calls a payment API, those actions are not part of the Kafka transaction. A crash after the external call but before the transaction commit means the external effect happened but Kafka will reprocess. For these, you need idempotency at the sink — for example an upsert keyed on a unique record ID, or a dedup table.
Consumer must opt in. Without isolation.level=read_committed, downstream consumers see aborted and uncommitted records. The producer side cannot enforce this; it is the reader’s responsibility.
There is a throughput cost. Transactions add coordination overhead and latency from the commit cycle and transaction markers. exactly_once_v2 reduced this substantially, but it is not free. Batch your transactions sensibly rather than committing one record at a time.

Pro Tip: For sinks outside Kafka, do not try to force exactly-once through the Kafka transaction. Embrace at-least-once delivery into the sink and make the sink idempotent. An INSERT ... ON CONFLICT DO NOTHING keyed on the record’s offset or a business key is simpler, faster, and more robust than any attempt to span a Kafka transaction across a foreign system.

Putting it together

A correct exactly-once pipeline is a small, specific set of choices, not a mysterious mode you flip on. The producer runs idempotent with acks=all. Multi-partition work and offset commits live inside transactions with a stable transactional.id. The read-process-write loop uses sendOffsetsToTransaction so input progress and output records commit atomically. Downstream consumers read with read_committed. And anything that touches the world outside Kafka gets idempotency at the sink rather than a false promise from the transaction.

Get those five things right and you have a pipeline that survives broker failures, producer restarts, and network blips without losing a record or emitting a duplicate. Get the isolation.level wrong or assume the guarantee extends to your database, and you have at-least-once wearing an exactly-once label — which is worse than honest at-least-once, because you stopped writing the dedup logic you actually still needed.

Exactly-once is a precise, powerful tool. Respect its boundaries and it will do exactly what its name promises, within the system it was designed for. If you are also hardening the operational side of your cluster, the same precision pays off in securing Kafka with TLS, SASL, and ACLs and in your monitoring.

— James