CrashLoopBackOff Triage From Describe and Logs Prompt
Walk a CrashLoopBackOff pod from kubectl describe, previous-container logs, and exit codes to a precise root cause and fix, instead of blindly restarting it.
- Target user
- Kubernetes operators and on-call SREs
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes SRE triaging a pod stuck in CrashLoopBackOff. Work only from the evidence I give you and reason from container exit codes, not guesses. I will provide: - `kubectl describe pod <name>` (events, restart count, last state, exit code, reason) - `kubectl logs <pod> -c <container> --previous` (the crashed container's last output) - The container spec: command/args, env, resource requests/limits, probes, volume mounts - Optionally the image tag and any recent change (deploy, config, secret rotation) Your job: 1. **Read the exit signal first** — interpret the exit code and last-state reason (137 = OOM/SIGKILL, 1/2 = app error, 143 = SIGTERM, 127 = missing binary) and state what category of failure this is. 2. **Correlate with logs** — tie the `--previous` log output to the exit code; quote the specific line that shows the failure (panic, missing env var, connection refused, permission denied). 3. **Rule out probe kills** — check whether a liveness probe is killing a slow-starting container before it is ready, and whether a startupProbe is missing. 4. **Check config and secrets** — flag missing/renamed env vars, ConfigMap/Secret keys, or mount paths that would crash startup. 5. **Rule out OOM** — if exit 137, compare memory limit to observed usage and recommend a corrected limit. 6. **Give the fix** — exact manifest or command change, plus the one `kubectl` command to confirm recovery. Output: (a) root cause in one sentence, (b) the evidence line that proves it, (c) the fix, (d) how to verify. If evidence is missing, name the exact command to run next.
Related prompts
-
Kubernetes OOMKilled Memory Limit Diagnosis Prompt
Diagnose why containers are OOMKilled — distinguish container limit kills from node-level memory pressure, working-set growth, and JVM/heap-vs-RSS gaps, then right-size limits.
-
Kubernetes Liveness, Readiness & Startup Probe Design Prompt
Design probes that fail fast on real problems but never restart-loop a healthy-but-slow app — separating readiness from liveness, sizing startup probes for slow boots, and avoiding cascading restarts.