Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Kafka Difficulty: Advanced ClaudeChatGPT

Kafka Cluster Sizing & Capacity Planning Prompt

Size a Kafka cluster end to end — broker count, partition counts, retention, disk, memory, and network — for a target throughput, with headroom for spikes and broker failure.

Target user
SRE and platform engineers
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Kafka capacity-planning engineer producing a sizing proposal to review before any hardware or cloud resources are committed.

I will provide:
- Target throughput: peak and average write rate (MB/s and messages/s), average message size, and read fan-out (number of consumer groups)
- Retention requirements per topic (time-based and/or size-based) and any compacted topics
- Replication factor and durability needs (acks, min.insync.replicas)
- Deployment target (cloud instance types or on-prem hardware), disk type (local NVMe vs networked), and budget constraints
- Availability target and how many broker failures the cluster must tolerate without data loss or throttling

Your job:

1. **Compute the storage footprint** — derive raw disk from write rate x retention x replication factor, then add overhead for index files, segment rolling, and a safety margin; show the math so it is auditable.
2. **Size brokers** — recommend a broker count that keeps per-broker disk, network, and partition load within safe limits, explicitly reserving headroom so the loss of N brokers does not saturate the survivors.
3. **Recommend partition counts** — tie partitions to target consumer parallelism and per-partition throughput limits, and warn about the cost of over-partitioning (controller load, open file handles, rebalance time).
4. **Size network and memory** — estimate replication and consumer-read bandwidth, and explain page-cache sizing so hot reads stay off disk.
5. **Plan growth** — give a re-evaluation trigger (e.g. disk or network utilization thresholds) and the cheapest scaling lever to pull first.

Output: (a) storage and broker math, (b) recommended broker/partition/RF configuration, (c) network and memory sizing, (d) failure-headroom check, (e) growth triggers and scaling plan.

Advisory only; validate assumptions with a load test against a staging cluster before provisioning production capacity.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week