Tuning Kafka Producer Throughput and Latency

The Kafka producer is the most tunable component in the entire stack, and the one most often left on defaults that quietly cap your throughput or inflate your tail latency. Almost every producer setting is a deliberate trade between three things you cannot all maximize at once: throughput, latency, and durability. Crank batching for throughput and you add latency. Demand the strongest durability with acks=all and you add a round trip. The goal of tuning is not to find one magic config but to decide, per workload, which corner of that triangle you actually need. This guide walks through the settings that matter — batching, compression, acks, and idempotence — and how to reason about them together.

The throughput, latency, durability triangle

Before changing a single value, decide what the workload demands. A clickstream pipeline tolerates seconds of latency and wants maximum throughput. A payment authorization path wants the lowest possible latency and absolute durability and will sacrifice raw throughput to get them. Those two workloads should not share a producer config.

Setting	Push toward throughput	Push toward latency	Push toward durability
`batch.size`	Larger	Smaller	Neutral
`linger.ms`	Higher	Near 0	Neutral
`compression.type`	`lz4`/`zstd`	`none`/`lz4`	Neutral
`acks`	`1`	`1`	`all`
`enable.idempotence`	`true`	`true`	`true`

Notice that idempotence is true in every column. In modern Kafka (3.x) it is on by default and there is rarely a reason to disable it — it gives you exactly-once delivery semantics from producer to broker at almost no cost, and prevents the duplicate records that retries would otherwise create.

Batching: the single biggest throughput lever

Kafka producers batch records per partition before sending. Two settings control batching, and together they are the most powerful throughput tool you have.

batch.size is the maximum size in bytes of a single batch per partition. The default is 16384 (16 KB), which is small for high-throughput workloads.
linger.ms is how long the producer waits to fill a batch before sending it. The default is 0, meaning the producer sends as soon as it can rather than waiting to accumulate records.

With linger.ms=0, a busy producer still batches whatever has accumulated while the previous send was in flight, but you leave throughput on the table. Raising linger.ms to even 5–20 ms lets batches fill, which dramatically improves throughput and compression ratio at the cost of a few milliseconds of added latency.

# High-throughput producer profile
batch.size=131072
linger.ms=20
compression.type=lz4
acks=1
enable.idempotence=true
buffer.memory=67108864

Here batch.size is raised to 128 KB and linger.ms to 20 ms so batches fill before sending. buffer.memory (default 32 MB) is the total memory the producer uses to buffer records waiting to be sent; raise it for high-throughput producers so a slow broker does not block your application threads prematurely.

Pro Tip: Tune batch.size and linger.ms together, never alone. A large batch.size with linger.ms=0 rarely fills the batch on bursty traffic, and a high linger.ms with a tiny batch.size just adds latency without improving throughput. They are a pair.

When the producer’s buffer fills faster than the broker can accept, sends block for up to max.block.ms and then throw. If you see TimeoutException on send(), the fix is usually more buffer.memory, a faster broker, or backpressure in your application — not a longer timeout.

Compression: cheap throughput if you pick the right codec

Compression happens on the producer, on the whole batch, before the data crosses the network. That means bigger batches compress better, which is another reason batching and compression reinforce each other. The codec choice is a CPU-versus-ratio trade.

lz4 is the pragmatic default for most workloads: fast compression and decompression with a solid ratio.
zstd gives a better compression ratio than lz4, often meaningfully reducing network and disk usage, at a modest CPU cost. It is an excellent choice when bandwidth or storage is the constraint.
snappy is fast but generally beaten by lz4 on ratio.
gzip has the best ratio but the highest CPU cost and is rarely worth it for high-throughput paths.

# Bandwidth-constrained profile favoring ratio
compression.type=zstd
batch.size=262144
linger.ms=50

A subtle but important point: keep the codec consistent across producers writing to the same topic, and be aware that brokers may recompress if the topic-level compression.type differs from the producer’s. The least surprising setup is producers compressing with a chosen codec and the topic set to producer so the broker stores the batch as-is.

acks and durability: the most consequential setting

acks controls how many broker replicas must acknowledge a write before the producer considers it successful. It is the dial that decides whether you can lose data.

acks=0: fire and forget. The producer never waits for acknowledgment. Highest throughput, but a record can be lost the instant the leader fails. Almost never appropriate for data you care about.
acks=1: the leader acknowledges after writing to its own log, before followers replicate. Fast, but if the leader fails before replication, that record is gone.
acks=all (also -1): the leader waits until all in-sync replicas have the record. This is the durable setting, and it is what any data you cannot afford to lose should use.

The critical detail with acks=all is that it only protects you in combination with the topic’s min.insync.replicas. If min.insync.replicas=2, a produce with acks=all succeeds only when at least two replicas have the record. If ISR shrinks below that, produces fail rather than silently accepting un-durable writes — which is the behavior you want. We cover that interaction in depth in Designing Kafka Topics: Partitions and Replication.

# Durable, low-loss profile for critical data
acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=5
delivery.timeout.ms=120000

With idempotence enabled, retries can be set very high safely because the broker deduplicates retried records using the producer’s sequence numbers. The real bound on retries is delivery.timeout.ms, the total time the producer will spend trying to deliver a record including retries. Tune that to your SLA rather than fiddling with retry counts.

Idempotence and exactly-once

Idempotence (enable.idempotence=true, the 3.x default) gives each producer a producer ID and per-partition sequence numbers so the broker discards duplicate records from retries. It is what makes aggressive retries safe. For idempotence to hold, max.in.flight.requests.per.connection must be 5 or less and acks must be all — the modern client enforces compatible values automatically.

For end-to-end exactly-once across a read-process-write pipeline, you go a step further with transactions:

transactional.id=orders-enricher-1
enable.idempotence=true
acks=all

Transactions let a consumer-process-produce loop commit its consumed offsets and produced records atomically, so a crash mid-flight neither double-processes nor drops data. They add coordination overhead and require careful transactional.id management, so reach for them only when the workload genuinely needs atomic, exactly-once processing rather than just no producer-side duplicates.

Pro Tip: Each transactional.id must be stable and unique per logical producer instance. Reusing the same ID across two live producers fences one of them off; generating a fresh ID on every restart defeats the zombie-fencing protection. Map the ID to a stable instance identity, like a Kubernetes StatefulSet pod ordinal.

Measuring before and after

Never tune by feel. Use the built-in performance tool to get a repeatable baseline and to validate every change:

kafka-producer-perf-test.sh \
  --topic perf-test \
  --num-records 5000000 \
  --record-size 1024 \
  --throughput -1 \
  --producer-props bootstrap.servers=kafka:9092 \
    acks=all batch.size=131072 linger.ms=20 compression.type=lz4

The tool reports records/sec, MB/sec, and latency percentiles. Change one variable at a time, rerun, and compare. The percentiles matter more than the average — a config that improves mean throughput while blowing out p99 latency is the wrong choice for a latency-sensitive path. This is exactly the kind of structured before/after data an AI assistant can summarize across many runs, flagging which single change moved p99 without helping median throughput.

Key takeaways

Point	Details
Tune to the workload	Throughput, latency, and durability trade off; pick the corner the workload needs.
Batching is the top lever	Raise `batch.size` and `linger.ms` together for throughput; both matter, neither alone.
Compression pairs with batching	`lz4` for balance, `zstd` when bandwidth or storage is the constraint.
`acks=all` plus `min.insync.replicas`	Durability comes from the combination, not `acks` alone.
Keep idempotence on	It makes aggressive retries safe and is the 3.x default; bound delivery with `delivery.timeout.ms`.

What I actually change first

When someone hands me an underperforming producer, the first thing I look at is linger.ms, because it is almost always sitting at the default 0 and the workload could easily absorb 10–20 ms of latency. That one change, paired with a larger batch.size, routinely doubles throughput on a busy producer and improves the compression ratio for free. It is the highest-leverage, lowest-risk knob in the whole set.

The setting I refuse to compromise on is acks for anything carrying real data. I have watched a team chase a throughput target down to acks=1 and then lose a window of records during a leader election — the throughput win was not worth the postmortem. With idempotence on by default in 3.x, the cost of acks=all is one round trip, and that is cheap insurance.

My read: producer tuning rewards measurement over intuition. Use kafka-producer-perf-test.sh, change one variable at a time, and watch p99 as closely as the average. AI is a good partner for digesting the resulting matrix of runs, but the discipline of one-variable-at-a-time is yours to keep.

— James

Build your AI Kafka workflow with DevOps AI ToolKit

DevOps AI ToolKit publishes prompts and automation guides for engineers running production streaming systems. Browse the full AI prompt library for prompts that help you summarize performance test runs, compare config matrices, and document tuning decisions.

FAQ

What is the single most effective Kafka producer throughput setting?

Batching. Raising batch.size and linger.ms together lets the producer accumulate larger batches before sending, which improves both throughput and compression ratio at the cost of a few milliseconds of latency.

Does acks=all guarantee no data loss?

Only in combination with the topic’s min.insync.replicas. acks=all waits for all in-sync replicas, and min.insync.replicas defines how many must be present for a write to succeed, so produces fail rather than accept un-durable writes.

Should I disable idempotence for more throughput?

No. Idempotence is the default in Kafka 3.x, costs very little, and makes aggressive retries safe by deduplicating records. Disabling it reintroduces the duplicate-on-retry problem with no meaningful throughput gain.

Which compression codec should I use?

lz4 is the balanced default. Choose zstd when network bandwidth or storage is the constraint and you can spend a little more CPU for a better ratio.