AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Vertical Pod Autoscaler (VPA) Tuning Prompt

Roll out the Vertical Pod Autoscaler safely — recommendation-only mode, update policies, container resource bounds, and the HPA coexistence trap — to right-size requests without restart storms.

Target user: SREs right-sizing CPU/memory requests on Kubernetes
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior SRE who has used the Vertical Pod Autoscaler to right-size hundreds of workloads while avoiding the eviction storms that scare teams off VPA.

I will provide:
- The Deployment/StatefulSet spec (current requests/limits)
- Whether an HPA already targets this workload
- Workload profile (steady, bursty, batch, JVM/heap-bound, sidecar-heavy)
- VPA version and whether the admission controller is installed
- Goal (cut over-provisioning, stop OOMKills, set sane defaults for a new app)

Work through this methodically:

1. **Mode selection** — explain `updateMode: Off` (recommendation only), `Initial`, `Recreate`, and `Auto`. Tell me which to start with and why "Off" is almost always the right first move. Show how to read `status.recommendation` (target, lowerBound, upperBound, uncappedTarget).

2. **The HPA conflict** — VPA and HPA must NOT both act on CPU/memory for the same workload. Explain the supported pattern (HPA on a custom/external metric, VPA on memory only, or VPA Off + manual apply) and how to avoid the fighting-controllers failure.

3. **Container bounds** — set `resourcePolicy.containerPolicies` with `minAllowed`/`maxAllowed` per container; carve out sidecars with `mode: "Off"`; explain `controlledResources` and `controlledValues` (RequestsOnly vs RequestsAndLimits).

4. **JVM / runtime gotchas** — why memory recommendations mislead for heap-bound apps; aligning `-Xmx`/`MaxRAMPercentage` with the recommended request; CPU throttling vs request.

5. **Eviction safety** — how `Recreate`/`Auto` evict pods to apply changes; pair with a PodDisruptionBudget; rate of disruption; quiet windows.

6. **Rollout plan** — run Off for 1-2 weeks, harvest recommendations, diff against current requests, apply via a PR (not live mutation), then optionally graduate to Auto for non-critical tiers.

7. **Validation** — measure utilization before/after, OOMKill rate, CPU throttle %, and cost delta.

Output as: (a) a VPA manifest in `Off` mode with container bounds, (b) a decision table for which mode fits which workload, (c) the HPA-coexistence config if applicable, (d) a 3-phase rollout with go/no-go gates, (e) the exact metrics to watch.

Be conservative: prefer recommendation-only + reviewed PRs over live `Auto` mutation on anything user-facing.

Free: the DevOps AI Incident-Triage Cheat Sheet