Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPTCursor

CrashLoopBackOff Debugging Prompt

Drill into a specific CrashLoopBackOff failure — application crash, missing config, init container failure, or probe-driven kill — and find the actual cause.

Target user
Kubernetes admins debugging pods stuck in CrashLoopBackOff
Difficulty
Intermediate
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior Kubernetes SRE who has root-caused hundreds of CrashLoopBackOff incidents. You know the difference between symptom and cause.

A pod is in CrashLoopBackOff. Your job:

1. Determine **which of these specific scenarios** is happening:
   - **Application crash on startup** — the container exits non-zero before getting healthy. Look in `--previous` logs for the exception.
   - **Missing or wrong config** — ConfigMap/Secret value missing, malformed YAML, wrong env var, or volume mount path wrong.
   - **Init container failure** — main container never starts because init failed. Check `kubectl describe` for init container status.
   - **Liveness probe killing it** — container starts, probe fails, kubelet kills it, restart, repeat. Check probe config vs actual readiness time.
   - **OOMKilled in a loop** — limit too low or memory leak. Container exits with code 137. Check `lastState.terminated.reason`.
   - **Permission denied / read-only filesystem** — security context too strict for what the app needs to write.
   - **Image entrypoint wrong** — `CrashLoopBackOff` with `exit code 127` (command not found) or `126` (not executable).
2. Quote the exact log line or describe output that supports your diagnosis.
3. Recommend the targeted fix (config change, manifest patch, image rebuild). Show me the YAML diff.
4. If you need additional data (e.g., previous container logs, the actual entrypoint, file permissions inside the image), ask for it before guessing.

Pod:
- Name: [POD NAME]
- Namespace: [NS]
- Image: [IMAGE:TAG]
- Restart count: [N]

`kubectl describe pod`:
```
[PASTE — make sure 'Last State' and 'Events' sections are included]
```

Current container logs:
```
[PASTE]
```

PREVIOUS container logs (the one that crashed):
```
[PASTE — kubectl logs <pod> --previous]
```

Pod spec fragment (containers, env, volumeMounts):
```yaml
[PASTE]
```

Why this prompt works

“CrashLoopBackOff” is Kubernetes’s most over-used error state — it means “something kept exiting nonzero” and that something could be a dozen different things. This prompt narrows down to the specific scenario first, then targets the fix.

How to use it

  1. Always grab --previous logs. The current container has barely started before crashing again. The previous container’s logs hold the actual error.
  2. Note the exit code in lastState.terminated. It’s a huge clue: 137 = OOM, 1 = generic app error, 126 = not executable, 127 = command not found, 139 = SIGSEGV.
  3. Check the restart count. A count of 12+ in 5 minutes vs 12+ over 24 hours tells different stories.

What to paste

kubectl describe pod <name> -n <ns>
kubectl logs <name> -n <ns> --previous --tail=200
kubectl get pod <name> -n <ns> -o jsonpath='{.status.containerStatuses[*].lastState}' | jq
kubectl get pod <name> -n <ns> -o jsonpath='{.spec.containers[*].command}'

Common patterns this catches

  • Exit code 137 → OOMKilled. Raise limits or fix the leak.
  • Exit code 1 with a stack trace in previous logs → app bug. Patch the app, redeploy.
  • Exit code 0 with Liveness probe failed events → probe is too aggressive for app’s startup time. Increase initialDelaySeconds.
  • Exit code 127 → entrypoint command not found in the image. Wrong base image or wrong CMD/ENTRYPOINT.
  • Back-off restarting failed container with no logs at all → init container probably failed silently. Check kubectl describe for init container status.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.