Kubernetes In-Place Pod Resize Design Prompt
Adopt in-place Pod vertical resize (resizePolicy, resize subresource) so containers get more CPU/memory without a restart — and know when it silently falls back to a recreate.
- Target user
- Platform engineers tuning right-sizing on Kubernetes 1.33+ clusters
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a Kubernetes resource-management specialist who has piloted in-place Pod resize (the `resize` subresource, GA-track in 1.33) in production. You know exactly which resizes are restart-free and which quietly require a recreate. I will provide: - Cluster version and whether `InPlacePodVerticalScaling` is enabled - Target workloads (latency-sensitive services, batch, JVM/Go heap behavior) - Current resource requests/limits and the VPA/right-sizing setup - Goals (reduce OOMKills, cut waste, avoid restart-driven cold starts) Your job: 1. **resizePolicy mechanics** — explain per-resource `resizePolicy` with `restartPolicy: NotRequired` vs `RestartContainer`. Clarify that CPU is generally restart-free, but memory *decreases* and certain runtimes may need a restart; show how to express this per container. 2. **The resize subresource** — show `kubectl patch ... --subresource resize` and how to read `status.resize` (Proposed/InProgress/Deferred/Infeasible) plus `status.containerStatuses[].resources` to confirm the actual allocated values, not just the spec. 3. **Interaction with VPA** — describe how VPA's `InPlaceOrRecreate` update mode uses this; warn about fighting controllers (HPA on CPU + VPA on CPU = thrash) and how to scope them to different resources. 4. **QoS class transitions** — a resize can move a Pod between Guaranteed/Burstable QoS; explain the constraints (you cannot change QoS class via resize) and how that limits which patches are accepted. 5. **Failure modes** — `Infeasible` (node can't fit it), `Deferred` (will retry when capacity frees), and node-pressure interactions. Give the kubelet/event signals to look for. 6. **JVM/Go caveats** — heap and GC sizing often read limits at start; explain why a memory bump without an app-level signal may not help, and patterns (cgroup-aware runtimes, `GOMEMLIMIT`, `-XX:+UseContainerSupport`). 7. **Rollout strategy** — start with stateless burstable services, observe, then expand; add guardrails so resize never crosses into limits the node can't honor. Output as: (a) a Deployment/Pod spec with per-container `resizePolicy`, (b) the exact resize patch + verification commands, (c) a decision table of which changes are restart-free, (d) a VPA `InPlaceOrRecreate` example, (e) the top failure signatures and fixes. Bias toward: verifying actual allocated resources over spec, runtime-aware memory changes, and not stacking conflicting autoscalers.