Kafka Cluster Sizing & Capacity Planning Prompt
Size a Kafka cluster end to end — broker count, partition counts, retention, disk, memory, and network — for a target throughput, with headroom for spikes and broker failure.
- Target user
- SRE and platform engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kafka capacity-planning engineer producing a sizing proposal to review before any hardware or cloud resources are committed. I will provide: - Target throughput: peak and average write rate (MB/s and messages/s), average message size, and read fan-out (number of consumer groups) - Retention requirements per topic (time-based and/or size-based) and any compacted topics - Replication factor and durability needs (acks, min.insync.replicas) - Deployment target (cloud instance types or on-prem hardware), disk type (local NVMe vs networked), and budget constraints - Availability target and how many broker failures the cluster must tolerate without data loss or throttling Your job: 1. **Compute the storage footprint** — derive raw disk from write rate x retention x replication factor, then add overhead for index files, segment rolling, and a safety margin; show the math so it is auditable. 2. **Size brokers** — recommend a broker count that keeps per-broker disk, network, and partition load within safe limits, explicitly reserving headroom so the loss of N brokers does not saturate the survivors. 3. **Recommend partition counts** — tie partitions to target consumer parallelism and per-partition throughput limits, and warn about the cost of over-partitioning (controller load, open file handles, rebalance time). 4. **Size network and memory** — estimate replication and consumer-read bandwidth, and explain page-cache sizing so hot reads stay off disk. 5. **Plan growth** — give a re-evaluation trigger (e.g. disk or network utilization thresholds) and the cheapest scaling lever to pull first. Output: (a) storage and broker math, (b) recommended broker/partition/RF configuration, (c) network and memory sizing, (d) failure-headroom check, (e) growth triggers and scaling plan. Advisory only; validate assumptions with a load test against a staging cluster before provisioning production capacity.
Related prompts
-
Kafka Partition Reassignment & Broker Decommission Plan Prompt
Plan a safe partition reassignment or broker decommission using throttled data movement, staged batches, and verification, minimizing impact on live traffic.
-
Kafka Topic Design & Partitioning Strategy Prompt
Design a Kafka topic from first principles — partition count, keying, replication factor, min.insync.replicas, and retention vs. compaction — to match ordering, scale, and durability needs.