Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes Resource Limits & OOMKilled Tuning Prompt

Tune CPU/memory requests and limits to stop OOMKilled, fix throttling, right-size HPA targets, and avoid noisy-neighbor scheduling issues.

Target user
Kubernetes platform engineers and SREs
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes platform engineer with deep experience tuning workload requests/limits and HPA configuration in production. You understand the difference between CPU throttling (silent and ugly) and OOMKilled (loud and visible), and you know that requests drive scheduling while limits drive enforcement.

I will provide:
- The symptom: `OOMKilled` pods, latency spikes (likely CPU throttling), pending pods (likely requests too high), HPA flapping, noisy-neighbor reports
- The workload's current resources spec: `kubectl get pod <p> -o yaml | yq '.spec.containers[].resources'`
- Recent usage metrics: `kubectl top pod` and ideally `kubectl describe pod` (Last State, Restart Count, Exit Code)
- Container runtime (containerd / CRI-O); cgroup version (v1 or v2)
- If HPA configured: `kubectl get hpa <name> -o yaml` + recent events
- Application context: language runtime (JVM, Go, Python, Node), whether the workload is steady-state or bursty

Your job:

1. **Decode the failure mode**:
   - **`OOMKilled` (exit code 137)** → memory limit exceeded; cgroup OOM
   - **`Error` with exit 137 and no OOM in describe** → external SIGKILL (might be `kubectl delete --grace-period=0`)
   - **Restart with no exit code change but high `kube_pod_container_status_restarts_total`** → liveness probe failures, not OOM
   - **Latency spikes with CPU usage well below limit** → CPU throttling (`container_cpu_cfs_throttled_seconds_total`)
   - **Pending pods, FailedScheduling** → requests can't fit on any node
   - **HPA flapping** → noisy metric or limits too tight relative to requests
2. **Apply the right model**:
   - **Requests** = what the scheduler reserves. Too low = noisy neighbor. Too high = pending pods.
   - **Limits** = the cap. Memory limit hit = OOMKilled (immediate). CPU limit hit = throttled (not killed).
   - **QoS class** is computed from request vs limit equality:
     - `Guaranteed` = both set and equal, on every container
     - `Burstable` = at least one request or limit set, not Guaranteed
     - `BestEffort` = nothing set; first to be evicted under pressure
   - **`Guaranteed` pods are scheduled with strong placement** and last to be evicted; use for critical workloads.
3. **Identify the right tuning direction**:
   - **OOMKilled**: raise memory limit AND requests in tandem (don't widen the gap; that just delays the OOM). Investigate why memory is growing (real working set, leak, GC heap not tuned).
   - **CPU throttling under low usage**: limit too low relative to bursty traffic; raise OR remove limit (many shops run with no CPU limit — controversial but defensible).
   - **Pending**: lower requests, or scale the cluster.
   - **HPA flapping**: stabilizationWindow / behavior tuning; check that requests match actual usage so HPA's % math is meaningful.
4. **For JVM / runtime-specific issues**:
   - JVM heap not sized to container memory → set `-XX:MaxRAMPercentage=75` (modern JVMs) instead of `-Xmx` hard-coded
   - Native memory growth (off-heap) → memory grows past `-Xmx`; size container `limit` 30-50% above heap for native
   - Node/V8 same story → `--max-old-space-size` per container memory
   - Go runtime: typically respects container limits via cgroups since 1.19+ but GOMEMLIMIT helps
   - Python: no per-process memory limit; OOMKilled is normal indicator
5. **Cgroup v2 implications** (modern clusters):
   - `memory.high` not commonly used in K8s; only `memory.max` (= limit) and `memory.min` (= request soft floor) on capable kernels
   - CPU throttling shows in `cpu.stat` `throttled_usec`
6. **For each recommendation**: state the BEFORE value, the AFTER value, the metric to watch, and the rollback if it doesn't help.

Mark anything irreversible (removing limits cluster-wide, lowering requests on a Guaranteed-class critical workload) DESTRUCTIVE.

---

Symptom: [OOMKilled / throttling / pending / HPA flap / other]
Workload: [Deployment/StatefulSet/Job name + namespace]
Current resources spec:
```yaml
[PASTE .spec.containers[*].resources]
```
Pod describe (Last State, Reason, Exit Code, Restart Count):
```
[PASTE relevant section]
```
Usage (kubectl top + recent peaks):
```
[PASTE]
```
HPA (if configured):
```yaml
[PASTE]
```
App runtime + memory model:
[DESCRIBE — JVM with -Xmx? Go? Python? heap vs RSS expectations]

Why this prompt works

Resource tuning is the most-asked Kubernetes question and the most-frequently-wrong answer. “Just raise the limit” buries leaks; “remove the limit” lets one pod consume a node. The right answer is workload-specific and depends on whether you’re seeing throttling, OOM, or eviction — and what the app’s memory model actually is. This prompt forces a diagnostic step before a tuning step.

How to use it

  1. Always include Last State from kubectl describe pod. It distinguishes OOM from livenessProbe restarts.
  2. Include actual usage from a window, not a single point. p95 over 24h tells you a different story than p100 once.
  3. Mention the runtime. JVM and Node need explicit heap config matched to container limits; Go usually doesn’t.
  4. For HPA tuning, include the HPA’s behavior block — most flapping is configurable away with stabilizationWindowSeconds.

Useful commands

# Pod state and history
kubectl describe pod <pod>
kubectl get pod <pod> -o yaml | yq '.status.containerStatuses'
kubectl get events --field-selector involvedObject.name=<pod>,type=Warning

# Restart count over time
kubectl get pods -o custom-columns=NAME:.metadata.name,RESTARTS:.status.containerStatuses[*].restartCount

# Current vs requested
kubectl top pod -n <ns> --containers
kubectl get pod <pod> -o jsonpath='{range .spec.containers[*]}{.name}{": req="}{.resources.requests}{" lim="}{.resources.limits}{"\n"}{end}'

# QoS class (computed)
kubectl get pod <pod> -o jsonpath='{.status.qosClass}'

# Resource usage from metrics-server (last 1 min)
kubectl top pod -A --sort-by=memory | head -20
kubectl top pod -A --sort-by=cpu | head -20

# Detect CPU throttling (requires Prometheus or container_cpu_cfs_throttled_seconds_total)
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu.stat   # cgroup v2

# Cgroup v1 throttle stats
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat

# HPA decision tracing
kubectl get hpa <hpa> -o yaml
kubectl describe hpa <hpa>     # recommendations and conditions
kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler

# VPA recommendation (off mode)
kubectl describe vpa <vpa>     # shows recommended values without acting

Decision matrix

SymptomMost likely causeFix direction
OOMKilled, RSS at limitReal working-set too small for limitRaise mem request+limit together; investigate leak
OOMKilled, RSS well below limitJVM/Node native memory grew past heapSize limit ~50% above heap for native overhead
Latency spikes, CPU below limitCPU throttling on a bursty workloadRaise CPU limit; consider removing CPU limit on dedicated node pool
Pending podsRequests can’t fitLower requests; scale cluster; check node taints
HPA flappingMetric noise or wrong baselineAdd stabilizationWindowSeconds; right-size requests
Pod evicted (not OOM)Node memory pressurePod is BestEffort/Burstable; raise to Guaranteed for critical
Restart with exit 1 (not 137)App crash, not OOMCheck app logs, not memory

QoS class implications

# Guaranteed — strongest scheduling, last to evict
resources:
  requests:
    cpu: "1"
    memory: "2Gi"
  limits:
    cpu: "1"        # same as request
    memory: "2Gi"   # same as request

# Burstable — usual choice for most workloads
resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

# BestEffort — first to evict; only for very low-priority batch
# (no requests, no limits)

JVM container hardening

env:
  # Modern JVMs (8u191+, 11+, 17+) — let JVM size heap from container limit
  - name: JAVA_TOOL_OPTIONS
    value: "-XX:MaxRAMPercentage=75 -XX:+UseG1GC -XshowSettings:vm"

Plus container resources:

resources:
  requests:
    memory: "2Gi"
  limits:
    memory: "2Gi"   # JVM will size heap to ~1.5Gi, leaving 0.5Gi for native

HPA hardening (avoid flapping)

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # 5 min before scaling down
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30

Common findings this catches

  • Sidecar without resources → drops the pod from Guaranteed to Burstable. Add matching request/limit on the sidecar.
  • limits.memoryrequests.memory → wide gap means OOM is delayed but not avoided when usage grows.
  • JVM -Xmx2g in a container with limits.memory: 2Gi → guaranteed OOMKilled when native + GC overhead > 0. Use MaxRAMPercentage.
  • HPA based on cpu: 80% but requests = 100m and actual usage = 50m → HPA sees 50% utilization; never scales. Right-size requests.
  • CPU limit set on a Java app with many threads → severe throttling during GC; consider removing CPU limit.

When to escalate

  • Cluster-wide eviction storms — node-level issue (disk pressure, kubelet config); not solvable in workload specs.
  • VPA in Auto mode causing rolling evictions during incident — switch to Off immediately, evaluate later.
  • HPA recommendations from external metrics (Prometheus adapter) that swing wildly — fix the metric pipeline first.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.