OpenStack Capacity Planning Prompt
Plan OpenStack capacity — CPU/RAM/disk oversubscription, growth modeling, hypervisor sizing, Cinder backend planning, network bandwidth.
- Target user
- OpenStack platform engineers and capacity planners
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack platform engineer who has planned and operated clouds with thousands of hypervisors. You know how to set oversubscription ratios that balance density with predictable performance. I will provide: - Current state: hypervisor count, vCPUs/RAM per HV, usage stats - Growth projections (instances/month) - Workload characteristics (general purpose, GPU, memory-heavy, latency-sensitive) - SLA requirements Your job: 1. **Establish baseline**: - Total physical CPU / RAM / Disk - Current allocation (sum of instance requests) - Current actual utilization (sum of measured use) - Allocation ratio = allocation / capacity 2. **Apply oversubscription thoughtfully**: - **CPU**: typical 4-8x for general workloads, 1:1 for HPC - **Memory**: 1-1.5x (memory is harder to reclaim) - **Disk**: 1x (no virtual disk oversubscription) - Configure via `cpu_allocation_ratio`, `ram_allocation_ratio` per host or aggregate 3. **Calculate effective capacity**: - effective vCPUs = pCPUs × cpu_ratio - effective RAM = pRAM × ram_ratio - Available vCPUs = effective - allocated 4. **Growth model**: - Instances per month + average size - Months to capacity = (effective - allocated) / monthly demand - When to expand (with lead time) 5. **For workload mix**: - General-purpose pool with moderate oversubscription - GPU/HPC pool with 1:1 (no oversubscription) - Memory-heavy DB pool with lower ratio - Use host aggregates + flavor extra_specs 6. **Storage capacity**: - Cinder backend capacity - Ephemeral disk on compute - Glance image cache - Plan headroom for snapshots / backups 7. **Network capacity**: - Tenant network bandwidth aggregate - Public IPs available - Floating IP pool - Bandwidth per compute (NIC) 8. **Quota planning**: - Per-project quotas - Sum of quotas vs cluster capacity (over-allocation OK if not all use simultaneously) Mark DESTRUCTIVE: lowering allocation ratios on a busy cluster (existing VMs may not fit), under-provisioning critical pools (failure mode), ignoring storage headroom (cluster-wide writes fail). --- Current capacity: [DESCRIBE] Workload mix: [DESCRIBE] Growth rate: [DESCRIBE] SLA: [DESCRIBE — uptime, performance]
Why this prompt works
Capacity planning is part observation, part modeling. This prompt walks both.
How to use it
- Start with measurement — current utilization vs allocation.
- Define workload pools — different ratios.
- Model growth — months to capacity.
- Plan with lead time.
Useful commands
# Hypervisor capacity
openstack hypervisor list --long
openstack hypervisor stats show
# Per-hypervisor detail
openstack hypervisor show <hostname>
# Resource providers (Placement)
openstack resource provider list
openstack resource provider inventory list <rp>
openstack resource provider usage show <rp>
# Aggregate-based pools
openstack aggregate list --long
openstack aggregate show <agg>
# Flavors
openstack flavor list
openstack flavor show <flavor>
# Project quotas
openstack quota list --project <project>
openstack quota show <project>
# Instance count + sum
openstack server list --all-projects --long | wc -l
# Compute service health
openstack compute service list
Common findings this catches
- Over 70% allocation at current rate → expand soon.
- High allocation but low usage → oversubscription is reasonable; can absorb growth.
- Specific aggregate near full while others empty → flavor design or scheduler bias.
- Snapshot accumulation consuming Cinder pool — implement retention.
- Quota sum >> capacity during incident → some projects starved; revisit.
- Network bandwidth saturated in one compute pool — rebalance or upgrade NICs.
- Memory oversubscription causing OOM → reduce ratio.
When to escalate
- Capacity below 90 days runway — emergency procurement.
- Workload shifts requiring new pool types — design with stakeholders.
- Cross-region capacity differences — strategic planning.
Related prompts
-
Linux High Load & CPU Runaway Investigation Prompt
Diagnose high load average, CPU saturation, run-queue pressure, IRQ storms, and steal time on Linux servers — distinguish user CPU vs system CPU vs I/O wait vs steal.
-
Nova Scheduler Filter Analysis Prompt
Diagnose why VMs aren't landing on hosts — review scheduler filters, weighers, host aggregates, placement allocations, and capacity.
-
OpenStack Upgrade Pre-Flight Review Prompt
Pre-upgrade safety review of an OpenStack cluster moving release N → N+1 — config drift, deprecated options, DB migrations, breaking changes, service ordering.