Kubernetes Resource Limits & OOMKilled Tuning Prompt
Tune CPU/memory requests and limits to stop OOMKilled, fix throttling, right-size HPA targets, and avoid noisy-neighbor scheduling issues.
- Target user
- Kubernetes platform engineers and SREs
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes platform engineer with deep experience tuning workload requests/limits and HPA configuration in production. You understand the difference between CPU throttling (silent and ugly) and OOMKilled (loud and visible), and you know that requests drive scheduling while limits drive enforcement.
I will provide:
- The symptom: `OOMKilled` pods, latency spikes (likely CPU throttling), pending pods (likely requests too high), HPA flapping, noisy-neighbor reports
- The workload's current resources spec: `kubectl get pod <p> -o yaml | yq '.spec.containers[].resources'`
- Recent usage metrics: `kubectl top pod` and ideally `kubectl describe pod` (Last State, Restart Count, Exit Code)
- Container runtime (containerd / CRI-O); cgroup version (v1 or v2)
- If HPA configured: `kubectl get hpa <name> -o yaml` + recent events
- Application context: language runtime (JVM, Go, Python, Node), whether the workload is steady-state or bursty
Your job:
1. **Decode the failure mode**:
- **`OOMKilled` (exit code 137)** → memory limit exceeded; cgroup OOM
- **`Error` with exit 137 and no OOM in describe** → external SIGKILL (might be `kubectl delete --grace-period=0`)
- **Restart with no exit code change but high `kube_pod_container_status_restarts_total`** → liveness probe failures, not OOM
- **Latency spikes with CPU usage well below limit** → CPU throttling (`container_cpu_cfs_throttled_seconds_total`)
- **Pending pods, FailedScheduling** → requests can't fit on any node
- **HPA flapping** → noisy metric or limits too tight relative to requests
2. **Apply the right model**:
- **Requests** = what the scheduler reserves. Too low = noisy neighbor. Too high = pending pods.
- **Limits** = the cap. Memory limit hit = OOMKilled (immediate). CPU limit hit = throttled (not killed).
- **QoS class** is computed from request vs limit equality:
- `Guaranteed` = both set and equal, on every container
- `Burstable` = at least one request or limit set, not Guaranteed
- `BestEffort` = nothing set; first to be evicted under pressure
- **`Guaranteed` pods are scheduled with strong placement** and last to be evicted; use for critical workloads.
3. **Identify the right tuning direction**:
- **OOMKilled**: raise memory limit AND requests in tandem (don't widen the gap; that just delays the OOM). Investigate why memory is growing (real working set, leak, GC heap not tuned).
- **CPU throttling under low usage**: limit too low relative to bursty traffic; raise OR remove limit (many shops run with no CPU limit — controversial but defensible).
- **Pending**: lower requests, or scale the cluster.
- **HPA flapping**: stabilizationWindow / behavior tuning; check that requests match actual usage so HPA's % math is meaningful.
4. **For JVM / runtime-specific issues**:
- JVM heap not sized to container memory → set `-XX:MaxRAMPercentage=75` (modern JVMs) instead of `-Xmx` hard-coded
- Native memory growth (off-heap) → memory grows past `-Xmx`; size container `limit` 30-50% above heap for native
- Node/V8 same story → `--max-old-space-size` per container memory
- Go runtime: typically respects container limits via cgroups since 1.19+ but GOMEMLIMIT helps
- Python: no per-process memory limit; OOMKilled is normal indicator
5. **Cgroup v2 implications** (modern clusters):
- `memory.high` not commonly used in K8s; only `memory.max` (= limit) and `memory.min` (= request soft floor) on capable kernels
- CPU throttling shows in `cpu.stat` `throttled_usec`
6. **For each recommendation**: state the BEFORE value, the AFTER value, the metric to watch, and the rollback if it doesn't help.
Mark anything irreversible (removing limits cluster-wide, lowering requests on a Guaranteed-class critical workload) DESTRUCTIVE.
---
Symptom: [OOMKilled / throttling / pending / HPA flap / other]
Workload: [Deployment/StatefulSet/Job name + namespace]
Current resources spec:
```yaml
[PASTE .spec.containers[*].resources]
```
Pod describe (Last State, Reason, Exit Code, Restart Count):
```
[PASTE relevant section]
```
Usage (kubectl top + recent peaks):
```
[PASTE]
```
HPA (if configured):
```yaml
[PASTE]
```
App runtime + memory model:
[DESCRIBE — JVM with -Xmx? Go? Python? heap vs RSS expectations]
Why this prompt works
Resource tuning is the most-asked Kubernetes question and the most-frequently-wrong answer. “Just raise the limit” buries leaks; “remove the limit” lets one pod consume a node. The right answer is workload-specific and depends on whether you’re seeing throttling, OOM, or eviction — and what the app’s memory model actually is. This prompt forces a diagnostic step before a tuning step.
How to use it
- Always include
Last Statefromkubectl describe pod. It distinguishes OOM from livenessProbe restarts. - Include actual usage from a window, not a single point. p95 over 24h tells you a different story than p100 once.
- Mention the runtime. JVM and Node need explicit heap config matched to container limits; Go usually doesn’t.
- For HPA tuning, include the HPA’s
behaviorblock — most flapping is configurable away withstabilizationWindowSeconds.
Useful commands
# Pod state and history
kubectl describe pod <pod>
kubectl get pod <pod> -o yaml | yq '.status.containerStatuses'
kubectl get events --field-selector involvedObject.name=<pod>,type=Warning
# Restart count over time
kubectl get pods -o custom-columns=NAME:.metadata.name,RESTARTS:.status.containerStatuses[*].restartCount
# Current vs requested
kubectl top pod -n <ns> --containers
kubectl get pod <pod> -o jsonpath='{range .spec.containers[*]}{.name}{": req="}{.resources.requests}{" lim="}{.resources.limits}{"\n"}{end}'
# QoS class (computed)
kubectl get pod <pod> -o jsonpath='{.status.qosClass}'
# Resource usage from metrics-server (last 1 min)
kubectl top pod -A --sort-by=memory | head -20
kubectl top pod -A --sort-by=cpu | head -20
# Detect CPU throttling (requires Prometheus or container_cpu_cfs_throttled_seconds_total)
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu.stat # cgroup v2
# Cgroup v1 throttle stats
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat
# HPA decision tracing
kubectl get hpa <hpa> -o yaml
kubectl describe hpa <hpa> # recommendations and conditions
kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler
# VPA recommendation (off mode)
kubectl describe vpa <vpa> # shows recommended values without acting
Decision matrix
| Symptom | Most likely cause | Fix direction |
|---|---|---|
| OOMKilled, RSS at limit | Real working-set too small for limit | Raise mem request+limit together; investigate leak |
| OOMKilled, RSS well below limit | JVM/Node native memory grew past heap | Size limit ~50% above heap for native overhead |
| Latency spikes, CPU below limit | CPU throttling on a bursty workload | Raise CPU limit; consider removing CPU limit on dedicated node pool |
| Pending pods | Requests can’t fit | Lower requests; scale cluster; check node taints |
| HPA flapping | Metric noise or wrong baseline | Add stabilizationWindowSeconds; right-size requests |
| Pod evicted (not OOM) | Node memory pressure | Pod is BestEffort/Burstable; raise to Guaranteed for critical |
| Restart with exit 1 (not 137) | App crash, not OOM | Check app logs, not memory |
QoS class implications
# Guaranteed — strongest scheduling, last to evict
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "1" # same as request
memory: "2Gi" # same as request
# Burstable — usual choice for most workloads
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
# BestEffort — first to evict; only for very low-priority batch
# (no requests, no limits)
JVM container hardening
env:
# Modern JVMs (8u191+, 11+, 17+) — let JVM size heap from container limit
- name: JAVA_TOOL_OPTIONS
value: "-XX:MaxRAMPercentage=75 -XX:+UseG1GC -XshowSettings:vm"
Plus container resources:
resources:
requests:
memory: "2Gi"
limits:
memory: "2Gi" # JVM will size heap to ~1.5Gi, leaving 0.5Gi for native
HPA hardening (avoid flapping)
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 min before scaling down
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
Common findings this catches
- Sidecar without resources → drops the pod from Guaranteed to Burstable. Add matching request/limit on the sidecar.
limits.memory2×requests.memory→ wide gap means OOM is delayed but not avoided when usage grows.- JVM
-Xmx2gin a container withlimits.memory: 2Gi→ guaranteed OOMKilled when native + GC overhead > 0. UseMaxRAMPercentage. - HPA based on
cpu: 80%but requests = 100m and actual usage = 50m → HPA sees 50% utilization; never scales. Right-size requests. - CPU limit set on a Java app with many threads → severe throttling during GC; consider removing CPU limit.
When to escalate
- Cluster-wide eviction storms — node-level issue (disk pressure, kubelet config); not solvable in workload specs.
- VPA in
Automode causing rolling evictions during incident — switch toOffimmediately, evaluate later. - HPA recommendations from external metrics (Prometheus adapter) that swing wildly — fix the metric pipeline first.
Related prompts
-
CrashLoopBackOff Debugging Prompt
Drill into a specific CrashLoopBackOff failure — application crash, missing config, init container failure, or probe-driven kill — and find the actual cause.
-
Kubernetes Pod Troubleshooting Prompt
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.
-
Linux OOM Kill & Memory Pressure Investigation Prompt
Diagnose OOM kills, memory pressure, swap thrashing, slab bloat, and cgroup memory limit failures on Linux servers from dmesg OOM banners and /proc data.