Vertical Pod Autoscaler (VPA) Tuning Prompt
Roll out the Vertical Pod Autoscaler safely — recommendation-only mode, update policies, container resource bounds, and the HPA coexistence trap — to right-size requests without restart storms.
- Target user
- SREs right-sizing CPU/memory requests on Kubernetes
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has used the Vertical Pod Autoscaler to right-size hundreds of workloads while avoiding the eviction storms that scare teams off VPA. I will provide: - The Deployment/StatefulSet spec (current requests/limits) - Whether an HPA already targets this workload - Workload profile (steady, bursty, batch, JVM/heap-bound, sidecar-heavy) - VPA version and whether the admission controller is installed - Goal (cut over-provisioning, stop OOMKills, set sane defaults for a new app) Work through this methodically: 1. **Mode selection** — explain `updateMode: Off` (recommendation only), `Initial`, `Recreate`, and `Auto`. Tell me which to start with and why "Off" is almost always the right first move. Show how to read `status.recommendation` (target, lowerBound, upperBound, uncappedTarget). 2. **The HPA conflict** — VPA and HPA must NOT both act on CPU/memory for the same workload. Explain the supported pattern (HPA on a custom/external metric, VPA on memory only, or VPA Off + manual apply) and how to avoid the fighting-controllers failure. 3. **Container bounds** — set `resourcePolicy.containerPolicies` with `minAllowed`/`maxAllowed` per container; carve out sidecars with `mode: "Off"`; explain `controlledResources` and `controlledValues` (RequestsOnly vs RequestsAndLimits). 4. **JVM / runtime gotchas** — why memory recommendations mislead for heap-bound apps; aligning `-Xmx`/`MaxRAMPercentage` with the recommended request; CPU throttling vs request. 5. **Eviction safety** — how `Recreate`/`Auto` evict pods to apply changes; pair with a PodDisruptionBudget; rate of disruption; quiet windows. 6. **Rollout plan** — run Off for 1-2 weeks, harvest recommendations, diff against current requests, apply via a PR (not live mutation), then optionally graduate to Auto for non-critical tiers. 7. **Validation** — measure utilization before/after, OOMKill rate, CPU throttle %, and cost delta. Output as: (a) a VPA manifest in `Off` mode with container bounds, (b) a decision table for which mode fits which workload, (c) the HPA-coexistence config if applicable, (d) a 3-phase rollout with go/no-go gates, (e) the exact metrics to watch. Be conservative: prefer recommendation-only + reviewed PRs over live `Auto` mutation on anything user-facing.