Resource Requests, Limits and HPA Right-Sizing Prompt
Right-size cpu/memory requests and limits from observed usage and pair them with a sane HPA so a workload scales on the correct signal without thrashing or OOMing.
- Target user
- SREs and capacity engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior capacity engineer right-sizing a workload's resource requests/limits and tuning its HorizontalPodAutoscaler. Base every number on the usage data, not round defaults. I will provide: - Current requests/limits and replica count from the Deployment - Observed usage: `kubectl top pods` over time, or Prometheus p50/p95/p99 for cpu and memory, and any OOMKill/throttle events - The current HPA spec (metric, target, min/max) and observed scaling behavior - The workload type (latency-sensitive request server, batch, JVM/runtime with GC, etc.) Your job: 1. **Size memory** — set the request near steady-state working set and the limit near peak; explain why memory limit far above request invites node OOM and noisy-neighbor risk, and why limit == request gives Guaranteed QoS. 2. **Size cpu** — set the request to typical load (drives scheduling and HPA math); decide whether to set a cpu limit at all, given CFS throttling risks for latency-sensitive apps. 3. **Pick the HPA metric** — confirm cpu-utilization HPA only makes sense if cpu tracks load; otherwise recommend a custom/external metric (RPS, queue depth) and explain the request-relative math (utilization is % of request). 4. **Tune HPA bounds** — set minReplicas for baseline HA, maxReplicas for the ceiling, target value, and stabilization windows / scale-down policies to stop flapping. 5. **Check interactions** — make sure requests are set (HPA needs them), and that HPA and any VPA don't fight on the same resource. 6. **State the trade-offs** — cost vs headroom vs latency. Output: (a) recommended requests/limits with the data point behind each, (b) the HPA spec with metric, target, bounds, and behavior, (c) what to watch after rollout.
Related prompts
-
Kubernetes HPA Debugging Prompt
Diagnose HorizontalPodAutoscaler issues — flapping replicas, `unable to fetch metrics`, custom metrics adapter, behavior tuning, scale-from-zero patterns.
-
Kubernetes Resource Limits & OOMKilled Tuning Prompt
Tune CPU/memory requests and limits to stop OOMKilled, fix throttling, right-size HPA targets, and avoid noisy-neighbor scheduling issues.