Designing Kafka Topics: Partitions and Replication

Topic design is the set of Kafka decisions that are easiest to get wrong and hardest to undo. Partition count, partition keys, replication factor, and retention policy all shape throughput, ordering guarantees, durability, and cost — and several of them cannot be changed cleanly after data is flowing. You can add partitions but never remove them, and adding them breaks key-to-partition mapping. You can raise replication factor but only through a deliberate reassignment. Get the design right at creation time and the topic scales with you for years; get it wrong and you inherit a migration project. This guide covers how to reason about each decision so the topic you create today still fits the workload a year from now.

Partition count: the parallelism and ordering trade

Partitions are Kafka’s unit of parallelism. A topic’s partition count sets the ceiling on how many consumers in a single group can read it concurrently, because each partition is consumed by exactly one member of a group. Twelve partitions means at most twelve active consumers; a thirteenth sits idle. So partition count is fundamentally a throughput decision.

But it is also an ordering decision. Kafka guarantees order within a partition, not across a topic. Records with the same key go to the same partition and stay ordered relative to each other. Records with different keys may land on different partitions and have no ordering guarantee between them. More partitions means more parallelism but a smaller scope of guaranteed ordering.

A practical sizing method:

Estimate target throughput for the topic in MB/sec.
Benchmark the per-partition throughput your consumers actually achieve (use kafka-producer-perf-test.sh and a representative consumer).
Divide target by per-partition throughput to get a baseline partition count.
Add headroom — round up generously, because adding partitions later is disruptive for keyed topics.

# Create a topic with a deliberate partition count and replication factor
kafka-topics.sh --bootstrap-server kafka:9092 \
  --create --topic orders \
  --partitions 24 \
  --replication-factor 3 \
  --config min.insync.replicas=2

Pro Tip: Resist the urge to create topics with hundreds of partitions “just in case.” Every partition adds open file handles, memory for the broker’s index, and replication overhead, and it lengthens leader-election and recovery time after a broker failure. Size for your real throughput plus reasonable headroom, not for an imaginary peak.

The reason headroom matters so much is the next section: changing partition count after the fact is not free.

Partition keys and the cost of resizing

When you produce a keyed record, Kafka hashes the key and assigns the record to hash(key) % partition_count. This is what keeps all events for a given customer, order, or device on the same partition and therefore in order. It is also why adding partitions is disruptive.

The moment you change partition_count, the modulo changes, so a key that used to map to partition 3 may now map to partition 7. New records for that key go to partition 7 while old records sit in partition 3, and per-key ordering is broken for every key that moved. For topics where ordering matters, this is a genuine correctness problem, not just a rebalance hiccup.

This leads to a few firm rules:

Choose a high-cardinality key so records distribute evenly. Keying by a low-cardinality field like country creates hot partitions where one or two partitions take most of the traffic and the rest sit idle — the skew problem behind a lot of consumer lag.
Size partitions up front for keyed topics, because resizing breaks ordering. Over-provision rather than plan to grow.
Use a null key only when you do not need ordering. Null-keyed records are distributed across partitions (sticky-batched in modern clients), which maximizes spread but gives up per-key ordering entirely.

# Inspect current partition layout and leadership before any resize
kafka-topics.sh --bootstrap-server kafka:9092 --describe --topic orders

If you must add partitions to a keyed topic, plan it like a migration: pause producers or accept a known ordering break, document the cutover, and communicate it to consumers that rely on ordering.

Replication factor and durability

Replication factor (RF) is how many copies of each partition Kafka keeps across brokers. RF is your protection against broker and disk failure. The near-universal production standard is RF=3: one leader and two followers, which tolerates the loss of one broker with zero data loss and still functions during planned maintenance.

Replication factor	Failure tolerance	Typical use
1	None — any broker loss loses data	Dev/test only
2	Survives one broker loss but tight during maintenance	Rarely recommended
3	Survives one broker loss with maintenance headroom	Production standard

RF=1 has no redundancy and belongs only in throwaway environments. RF=2 technically survives one failure but leaves no headroom: if one broker is down for maintenance and another fails, you are exposed. RF=3 is the default recommendation precisely because it tolerates one failure and a concurrent maintenance window.

You can change RF after creation, but only through a partition reassignment with kafka-reassign-partitions.sh, which moves real data across brokers and consumes network and disk. It is a planned operation, not a quick toggle, so set RF correctly at creation.

min.insync.replicas: where durability is actually enforced

Replication factor alone does not guarantee durability. The setting that enforces it is min.insync.replicas, working together with the producer’s acks=all.

min.insync.replicas defines how many replicas must be in sync for a produce request with acks=all to succeed. The standard pairing is RF=3 with min.insync.replicas=2. That combination means:

A write succeeds only when at least two replicas (leader plus one follower) have the record.
You can lose one broker and still accept writes, because two replicas remain.
If a second broker drops and only one replica is in sync, produces fail rather than silently accepting writes that could be lost.

That last behavior is the whole point. Setting min.insync.replicas=1 on an RF=3 topic quietly defeats the durability you paid for, because a single in-sync replica is enough to accept a write that a leader failure could then erase. And setting min.insync.replicas equal to RF removes all failure headroom — lose any one broker and the topic stops accepting writes. RF=3 with min.insync.replicas=2 is the balanced sweet spot.

# Enforce durable writes on an existing topic
kafka-configs.sh --bootstrap-server kafka:9092 \
  --alter --topic orders \
  --add-config min.insync.replicas=2

Pro Tip: min.insync.replicas is a per-topic config but also has a broker-level default. Set it explicitly per topic for anything carrying important data — relying on the broker default means a topic created without it silently inherits whatever the cluster default happens to be, which may be 1.

This interaction is exactly why producer acks and topic design must be reasoned about together, a point we expand on in Tuning Kafka Producer Throughput and Latency.

Retention versus compaction: two different jobs

Kafka topics have a cleanup.policy, and choosing between delete and compact is a fundamental design decision because they solve different problems.

Time and size retention (delete)

The default policy, delete, removes records once they age past retention.ms or once the partition exceeds retention.bytes. This is right for event streams — logs, metrics, clickstream — where old records lose value and you want bounded storage.

# Keep 7 days of events, capped at 50 GB per partition
kafka-configs.sh --bootstrap-server kafka:9092 \
  --alter --topic events \
  --add-config retention.ms=604800000,retention.bytes=53687091200

Log compaction (compact)

The compact policy keeps only the latest value for each key, retaining the most recent record per key indefinitely while removing superseded older values. This turns a topic into a durable changelog or snapshot of current state — ideal for things like the latest configuration per service, the current balance per account, or Kafka Streams state-store backing topics.

# Retain the latest value per key as a changelog
kafka-topics.sh --bootstrap-server kafka:9092 \
  --create --topic account-balances \
  --partitions 12 --replication-factor 3 \
  --config cleanup.policy=compact \
  --config min.insync.replicas=2

A few compaction details that trip people up:

A record with a key and a null value is a tombstone — compaction treats it as a delete for that key and eventually removes it.
Compaction is asynchronous and does not run instantly; there is always a recent “dirty” portion of the log that still holds duplicates.
You can combine policies with cleanup.policy=compact,delete to compact and enforce a retention bound, useful when you want current-state semantics but also a hard storage cap.

Choosing the wrong policy is a quiet but expensive mistake. A changelog topic left on delete loses state when records age out; an event stream set to compact keyed on something near-unique never reclaims space.

Key takeaways

Point	Details
Partitions set the parallelism ceiling	At most one consumer per partition per group; size for throughput plus headroom.
Resizing breaks keyed ordering	Adding partitions changes key-to-partition mapping; over-provision keyed topics up front.
RF=3 is the production standard	Tolerates one broker loss with maintenance headroom; RF=1 is dev-only.
Durability lives in min.insync.replicas	RF=3 with `min.insync.replicas=2` and `acks=all` is the durable, balanced pairing.
Pick the right cleanup policy	`delete` for event streams, `compact` for changelogs and current-state topics.

The decisions I treat as irreversible

When I design a topic, I mentally split the settings into “reversible” and “effectively permanent.” Retention and min.insync.replicas I can change on a live topic with one command, so I do not agonize over them. Partition count for a keyed topic I treat as effectively permanent, because changing it breaks ordering for every key that moves, and that is a correctness problem disguised as a config change. So I spend my design energy on partition count and the choice of key, and I over-provision partitions without apology.

The mistake I see most often is min.insync.replicas=1 on an RF=3 topic. The team set RF=3, feels protected, and never realizes that a single in-sync replica is enough to accept a write a leader failure could erase. RF buys you the copies; min.insync.replicas is what actually makes them count. Always set it explicitly to 2 for important data rather than trusting the broker default.

My read: topic design is the one area of Kafka where a few minutes of thought at creation time saves a migration project later. Decide what is reversible, over-provision the parts that are not, and pair acks=all with min.insync.replicas=2 every time durability matters.

— James

Build your AI Kafka workflow with DevOps AI ToolKit

DevOps AI ToolKit publishes prompts and automation guides for engineers running production streaming systems. Browse the full AI prompt library for prompts that help you document topic design decisions, review configs, and standardize topic creation across teams.

FAQ

How many partitions should a Kafka topic have?

Enough to meet your target throughput at your measured per-partition consumer rate, plus headroom. Because adding partitions later breaks ordering for keyed topics, size up front rather than planning to grow.

Can I reduce the number of partitions later?

No. Kafka only allows increasing partition count. Decreasing it is not supported, which is another reason to size with headroom at creation time.

What is the right replication factor for production?

RF=3 is the standard. It tolerates the loss of one broker with zero data loss and still leaves headroom for a planned maintenance window. RF=1 is for dev and test only.

When should I use log compaction instead of retention?

Use compact when the topic represents current state per key — a changelog, the latest config per service, or a Kafka Streams state store. Use the default delete policy for event streams where old records lose value.