AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes Resource Limits & OOMKilled Tuning Prompt

Tune CPU/memory requests and limits to stop OOMKilled, fix throttling, right-size HPA targets, and avoid noisy-neighbor scheduling issues.

Target user: Kubernetes platform engineers and SREs
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior Kubernetes platform engineer with deep experience tuning workload requests/limits and HPA configuration in production. You understand the difference between CPU throttling (silent and ugly) and OOMKilled (loud and visible), and you know that requests drive scheduling while limits drive enforcement.

I will provide:
- The symptom: `OOMKilled` pods, latency spikes (likely CPU throttling), pending pods (likely requests too high), HPA flapping, noisy-neighbor reports
- The workload's current resources spec: `kubectl get pod <p> -o yaml | yq '.spec.containers[].resources'`
- Recent usage metrics: `kubectl top pod` and ideally `kubectl describe pod` (Last State, Restart Count, Exit Code)
- Container runtime (containerd / CRI-O); cgroup version (v1 or v2)
- If HPA configured: `kubectl get hpa <name> -o yaml` + recent events
- Application context: language runtime (JVM, Go, Python, Node), whether the workload is steady-state or bursty

Your job:

1. **Decode the failure mode**:
   - **`OOMKilled` (exit code 137)** → memory limit exceeded; cgroup OOM
   - **`Error` with exit 137 and no OOM in describe** → external SIGKILL (might be `kubectl delete --grace-period=0`)
   - **Restart with no exit code change but high `kube_pod_container_status_restarts_total`** → liveness probe failures, not OOM
   - **Latency spikes with CPU usage well below limit** → CPU throttling (`container_cpu_cfs_throttled_seconds_total`)
   - **Pending pods, FailedScheduling** → requests can't fit on any node
   - **HPA flapping** → noisy metric or limits too tight relative to requests
2. **Apply the right model**:
   - **Requests** = what the scheduler reserves. Too low = noisy neighbor. Too high = pending pods.
   - **Limits** = the cap. Memory limit hit = OOMKilled (immediate). CPU limit hit = throttled (not killed).
   - **QoS class** is computed from request vs limit equality:
     - `Guaranteed` = both set and equal, on every container
     - `Burstable` = at least one request or limit set, not Guaranteed
     - `BestEffort` = nothing set; first to be evicted under pressure
   - **`Guaranteed` pods are scheduled with strong placement** and last to be evicted; use for critical workloads.
3. **Identify the right tuning direction**:
   - **OOMKilled**: raise memory limit AND requests in tandem (don't widen the gap; that just delays the OOM). Investigate why memory is growing (real working set, leak, GC heap not tuned).
   - **CPU throttling under low usage**: limit too low relative to bursty traffic; raise OR remove limit (many shops run with no CPU limit — controversial but defensible).
   - **Pending**: lower requests, or scale the cluster.
   - **HPA flapping**: stabilizationWindow / behavior tuning; check that requests match actual usage so HPA's % math is meaningful.
4. **For JVM / runtime-specific issues**:
   - JVM heap not sized to container memory → set `-XX:MaxRAMPercentage=75` (modern JVMs) instead of `-Xmx` hard-coded
   - Native memory growth (off-heap) → memory grows past `-Xmx`; size container `limit` 30-50% above heap for native
   - Node/V8 same story → `--max-old-space-size` per container memory
   - Go runtime: typically respects container limits via cgroups since 1.19+ but GOMEMLIMIT helps
   - Python: no per-process memory limit; OOMKilled is normal indicator
5. **Cgroup v2 implications** (modern clusters):
   - `memory.high` not commonly used in K8s; only `memory.max` (= limit) and `memory.min` (= request soft floor) on capable kernels
   - CPU throttling shows in `cpu.stat` `throttled_usec`
6. **For each recommendation**: state the BEFORE value, the AFTER value, the metric to watch, and the rollback if it doesn't help.

Mark anything irreversible (removing limits cluster-wide, lowering requests on a Guaranteed-class critical workload) DESTRUCTIVE.

---

Symptom: [OOMKilled / throttling / pending / HPA flap / other]
Workload: [Deployment/StatefulSet/Job name + namespace]
Current resources spec:
```yaml
[PASTE .spec.containers[*].resources]
```
Pod describe (Last State, Reason, Exit Code, Restart Count):
```
[PASTE relevant section]
```
Usage (kubectl top + recent peaks):
```
[PASTE]
```
HPA (if configured):
```yaml
[PASTE]
```
App runtime + memory model:
[DESCRIBE — JVM with -Xmx? Go? Python? heap vs RSS expectations]

Run this prompt with AI

Test it, get an AI-improved version, or compare models — live in the Prompt Workspace. No copy-paste.

Safety notes

Raising a Deployment's resource limits triggers a rolling restart of all pods. Plan for the restart, especially for StatefulSets.
Removing CPU limits ('limitless') is a viable strategy for latency-sensitive workloads, but only do this on dedicated node pools — otherwise one bad pod can consume an entire node.
Setting requests too high can leave nodes underutilized and increase cost. Right-size to p95-p99 of actual usage, not p100.
QoS `Guaranteed` is enforced by setting request == limit for ALL containers in the pod, INCLUDING sidecars. Forgetting the sidecar drops the QoS class.
HPA scales based on `request`, not actual usage — if requests are wrong, HPA percentages are meaningless.
Memory limit lowering on a running workload that needs the memory will OOMKill immediately on rollout.
VPA (Vertical Pod Autoscaler) in `Auto` mode evicts pods to resize them. In doubt, run it in `Off` (recommendation only) mode first.

Why this prompt works

Resource tuning is the most-asked Kubernetes question and the most-frequently-wrong answer. “Just raise the limit” buries leaks; “remove the limit” lets one pod consume a node. The right answer is workload-specific and depends on whether you’re seeing throttling, OOM, or eviction — and what the app’s memory model actually is. This prompt forces a diagnostic step before a tuning step.

How to use it

Always include Last State from kubectl describe pod. It distinguishes OOM from livenessProbe restarts.
Include actual usage from a window, not a single point. p95 over 24h tells you a different story than p100 once.
Mention the runtime. JVM and Node need explicit heap config matched to container limits; Go usually doesn’t.
For HPA tuning, include the HPA’s behavior block — most flapping is configurable away with stabilizationWindowSeconds.

Useful commands

# Pod state and history
kubectl describe pod <pod>
kubectl get pod <pod> -o yaml | yq '.status.containerStatuses'
kubectl get events --field-selector involvedObject.name=<pod>,type=Warning

# Restart count over time
kubectl get pods -o custom-columns=NAME:.metadata.name,RESTARTS:.status.containerStatuses[*].restartCount

# Current vs requested
kubectl top pod -n <ns> --containers
kubectl get pod <pod> -o jsonpath='{range .spec.containers[*]}{.name}{": req="}{.resources.requests}{" lim="}{.resources.limits}{"\n"}{end}'

# QoS class (computed)
kubectl get pod <pod> -o jsonpath='{.status.qosClass}'

# Resource usage from metrics-server (last 1 min)
kubectl top pod -A --sort-by=memory | head -20
kubectl top pod -A --sort-by=cpu | head -20

# Detect CPU throttling (requires Prometheus or container_cpu_cfs_throttled_seconds_total)
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu.stat   # cgroup v2

# Cgroup v1 throttle stats
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat

# HPA decision tracing
kubectl get hpa <hpa> -o yaml
kubectl describe hpa <hpa>     # recommendations and conditions
kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler

# VPA recommendation (off mode)
kubectl describe vpa <vpa>     # shows recommended values without acting

Decision matrix

Symptom	Most likely cause	Fix direction
OOMKilled, RSS at limit	Real working-set too small for limit	Raise mem request+limit together; investigate leak
OOMKilled, RSS well below limit	JVM/Node native memory grew past heap	Size limit ~50% above heap for native overhead
Latency spikes, CPU below limit	CPU throttling on a bursty workload	Raise CPU limit; consider removing CPU limit on dedicated node pool
Pending pods	Requests can’t fit	Lower requests; scale cluster; check node taints
HPA flapping	Metric noise or wrong baseline	Add `stabilizationWindowSeconds`; right-size requests
Pod evicted (not OOM)	Node memory pressure	Pod is BestEffort/Burstable; raise to Guaranteed for critical
Restart with exit 1 (not 137)	App crash, not OOM	Check app logs, not memory

QoS class implications

# Guaranteed — strongest scheduling, last to evict
resources:
  requests:
    cpu: "1"
    memory: "2Gi"
  limits:
    cpu: "1"        # same as request
    memory: "2Gi"   # same as request

# Burstable — usual choice for most workloads
resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

# BestEffort — first to evict; only for very low-priority batch
# (no requests, no limits)

JVM container hardening

env:
  # Modern JVMs (8u191+, 11+, 17+) — let JVM size heap from container limit
  - name: JAVA_TOOL_OPTIONS
    value: "-XX:MaxRAMPercentage=75 -XX:+UseG1GC -XshowSettings:vm"

Plus container resources:

resources:
  requests:
    memory: "2Gi"
  limits:
    memory: "2Gi"   # JVM will size heap to ~1.5Gi, leaving 0.5Gi for native

HPA hardening (avoid flapping)

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # 5 min before scaling down
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30

Common findings this catches

Sidecar without resources → drops the pod from Guaranteed to Burstable. Add matching request/limit on the sidecar.
limits.memory 2× requests.memory → wide gap means OOM is delayed but not avoided when usage grows.
JVM -Xmx2g in a container with limits.memory: 2Gi → guaranteed OOMKilled when native + GC overhead > 0. Use MaxRAMPercentage.
HPA based on cpu: 80% but requests = 100m and actual usage = 50m → HPA sees 50% utilization; never scales. Right-size requests.
CPU limit set on a Java app with many threads → severe throttling during GC; consider removing CPU limit.

When to escalate

Cluster-wide eviction storms — node-level issue (disk pressure, kubelet config); not solvable in workload specs.
VPA in Auto mode causing rolling evictions during incident — switch to Off immediately, evaluate later.
HPA recommendations from external metrics (Prometheus adapter) that swing wildly — fix the metric pipeline first.

Related prompts

More Kubernetes & Helm prompts & error guides

Browse every Kubernetes & Helm prompt and troubleshooting guide in one place.

Free download · 368-page PDF

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
Instant PDF download — yours free, forever
Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.