Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 10 min read

Kubernetes Error Guide: 'CrashLoopBackOff' Pod Restart Loop

Fix the CrashLoopBackOff pod restart loop in Kubernetes: diagnose startup crashes, failing liveness probes, missing dependencies, OOM during init, and bad entrypoints.

  • #kubernetes
  • #troubleshooting
  • #errors
  • #pods

Overview

CrashLoopBackOff is not an error in itself — it is the state Kubernetes reports when a container keeps crashing and the kubelet has started backing off between restart attempts. The container starts, exits (cleanly or with an error), the kubelet restarts it, it exits again, and after a few cycles the kubelet inserts an exponential delay (10s, 20s, 40s, up to 5 minutes) before the next try. The pod is stuck restarting forever.

You will see this in the pod status column:

NAME                        READY   STATUS             RESTARTS   AGE
api-7c9d8f6b4d-2xqlk        0/1     CrashLoopBackOff   6 (42s ago) 4m12s

It occurs whenever the container’s main process terminates shortly after start — a config error that makes the app exit, a liveness probe that keeps killing it, a dependency that is unreachable at boot, an out-of-memory kill during initialization, or simply a wrong command that exits immediately. The back-off is host-specific to that pod; the root cause is almost always inside the container’s startup path.

Symptoms

  • Pod cycles between Running, Error/Completed, and CrashLoopBackOff with a climbing RESTARTS count.
  • READY stays at 0/1 (or 0/N).
  • kubectl describe pod shows Last State: Terminated with a non-zero exit code.
  • The previous container’s logs contain a stack trace or a fatal config message.
kubectl get pods -l app=api
NAME                        READY   STATUS             RESTARTS      AGE
api-7c9d8f6b4d-2xqlk        0/1     CrashLoopBackOff   6 (42s ago)   4m12s
kubectl describe pod api-7c9d8f6b4d-2xqlk | grep -A6 'Last State'
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 23 Jun 2026 14:02:08 +0000
      Finished:     Mon, 23 Jun 2026 14:02:09 +0000
    Restart Count:  6

Common Root Causes

1. App crashes on startup from bad config or missing env var

The most common cause: the container starts, reads its configuration, finds something missing or invalid, and exits with a non-zero code. The logs from the previous (crashed) container tell you exactly what.

kubectl logs api-7c9d8f6b4d-2xqlk --previous
2026-06-23T14:02:09Z FATAL config: required environment variable DATABASE_URL is not set
panic: missing DATABASE_URL
goroutine 1 [running]:
main.mustEnv(...)
exit status 1

A missing ConfigMap key, a typo in an env var name, or a malformed config file all land here. Fix the env/ConfigMap and the pod stabilizes.

2. Failing liveness probe keeps killing the container

If a liveness probe fails repeatedly, the kubelet kills and restarts the container even when the app itself is fine — often because the probe path/port is wrong or the initialDelaySeconds is too short for a slow-starting app.

kubectl describe pod api-7c9d8f6b4d-2xqlk | grep -A4 'Liveness'
kubectl get events --field-selector involvedObject.name=api-7c9d8f6b4d-2xqlk | grep -i probe
    Liveness:   http-get http://:8080/healthz delay=5s timeout=1s period=10s #success=1 #failure=3
  Warning  Unhealthy  Liveness probe failed: Get "http://10.244.1.7:8080/healthz": dial tcp 10.244.1.7:8080: connect: connection refused
  Warning  Killing    Container api failed liveness probe, will be restarted

Here the app listens on :9090 but the probe checks :8080, so the kubelet kills it on every cycle. Align the probe port/path or raise initialDelaySeconds.

3. Missing dependency or unreachable service at boot

An app that connects to a database, cache, or upstream API during startup will crash if that dependency is not ready or not reachable, especially on first deploy when everything starts at once.

kubectl logs api-7c9d8f6b4d-2xqlk --previous | tail -5
2026-06-23T14:02:09Z ERROR failed to connect to postgres host=db.prod.svc.cluster.local:5432
dial tcp 10.96.34.12:5432: connect: connection refused
2026-06-23T14:02:09Z FATAL cannot start without database, exiting

The fix is usually an init container or retry-with-backoff in the app so it tolerates a not-yet-ready dependency rather than exiting.

4. OOMKilled during initialization

If the container exceeds its memory limit while loading data, building a cache, or starting a heavy runtime, the kernel OOM-kills it before it can serve traffic, producing exit code 137 and a crash loop.

kubectl describe pod api-7c9d8f6b4d-2xqlk | grep -A5 'Last State'
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 23 Jun 2026 14:02:06 +0000
      Finished:     Mon, 23 Jun 2026 14:02:08 +0000

Reason: OOMKilled plus exit code 137 means the limit is too low for the startup workload. Raise resources.limits.memory or reduce what the app loads at boot.

5. Wrong command or entrypoint that exits immediately

If command/args (or the image’s entrypoint) points at something that runs and finishes — or a binary that does not exist — the container “completes” instantly and the kubelet restarts it endlessly.

kubectl logs api-7c9d8f6b4d-2xqlk --previous
kubectl get pod api-7c9d8f6b4d-2xqlk -o jsonpath='{.spec.containers[0].command}'
exec /app/server: no such file or directory
["/app/server","--serve"]

The binary path is wrong (built for a different arch or copied to a different location). A container whose command is, say, echo hello will also loop because it exits 0 immediately — Kubernetes restarts Always by default.

6. File or permission errors at startup

A read-only root filesystem, a non-root securityContext that cannot write to a needed path, or a missing mounted volume causes the process to fail when it tries to open or write a file.

kubectl logs api-7c9d8f6b4d-2xqlk --previous | tail -3
2026-06-23T14:02:09Z FATAL open /var/run/app/app.pid: permission denied
2026-06-23T14:02:09Z app running as uid=1000, /var/run/app owned by root:root mode 0755
exit status 1

Fix the volume mount, adjust fsGroup/runAsUser, or write to an emptyDir the process can actually access.

Diagnostic Workflow

Step 1: Confirm the state and restart count

kubectl get pods -l app=<APP> -o wide

A climbing RESTARTS with CrashLoopBackOff confirms the loop; note the node and age.

Step 2: Read the previous container’s logs

kubectl logs <POD> --previous
kubectl logs <POD> --previous --tail=50

The crashed container’s stdout/stderr almost always contains the fatal message. Use --previous because the current container may have already restarted.

Step 3: Inspect the termination reason and exit code

kubectl describe pod <POD>
kubectl get pod <POD> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'

Map the exit code: 1/2 app error, 137 OOM/SIGKILL, 139 segfault, 143 SIGTERM. Check the Reason (Error, OOMKilled, CrashLoopBackOff).

Step 4: Check probes and events

kubectl describe pod <POD> | grep -E 'Liveness|Readiness|Startup'
kubectl get events --field-selector involvedObject.name=<POD> --sort-by='.lastTimestamp'

Unhealthy/Killing events point at a probe killing a healthy app rather than the app crashing on its own.

Step 5: Verify config, command, and resources

kubectl get pod <POD> -o jsonpath='{.spec.containers[0].command} {.spec.containers[0].args}'
kubectl describe pod <POD> | grep -A3 'Limits\|Requests'
kubectl get configmap,secret -n <NS>

Confirm the entrypoint exists, the referenced ConfigMap/Secret keys are present, and the memory limit is adequate.

Example Root Cause Analysis

A new deploy of checkout-api rolls out and every pod lands in CrashLoopBackOff within seconds.

kubectl get pods -l app=checkout-api
NAME                              READY   STATUS             RESTARTS      AGE
checkout-api-58fbd6c4c7-7p2nv     0/1     CrashLoopBackOff   5 (18s ago)   2m41s

The describe output shows a clean exit, not an OOM:

    Last State:     Terminated
      Reason:       Error
      Exit Code:    1

So we read the previous container’s logs:

kubectl logs checkout-api-58fbd6c4c7-7p2nv --previous | tail -4
2026-06-23T14:05:11Z FATAL config: REDIS_ADDR is empty
panic: redis address required
exit status 1

The app needs REDIS_ADDR, which comes from a ConfigMap. Checking it:

kubectl get configmap checkout-config -o jsonpath='{.data.REDIS_ADDR}'
(empty)

The ConfigMap was rolled out with a renamed key (REDIS_HOST instead of REDIS_ADDR) but the Deployment still references REDIS_ADDR. The env var resolves to empty, the app refuses to start, and the kubelet loops it.

Fix: align the ConfigMap key with what the Deployment expects, then let the rollout restart:

kubectl patch configmap checkout-config --type merge \
  -p '{"data":{"REDIS_ADDR":"redis.prod.svc.cluster.local:6379"}}'
kubectl rollout restart deployment checkout-api

The pods come up 1/1 Running and the restart count stops climbing.

Prevention Best Practices

  • Always check kubectl logs --previous first — the crashed container’s output is the single most useful artifact for a crash loop.
  • Make startup tolerant of not-yet-ready dependencies: use init containers or in-app retry-with-backoff instead of exiting on the first failed connection.
  • Set initialDelaySeconds/startupProbe generously for slow-booting apps so a liveness probe never kills a container that simply needs more time to start.
  • Validate ConfigMap and Secret keys against what the app reads before rollout; a renamed or missing key is a leading cause of startup crashes.
  • Size resources.limits.memory for the peak of initialization, not steady state, so the container is not OOM-killed while loading data.
  • Pin and test image entrypoints in CI so a wrong command/arch mismatch never reaches the cluster. See more in Kubernetes & Helm guides.

Quick Command Reference

# See the crash loop and restart count
kubectl get pods -l app=<APP> -o wide

# The crashed container's logs (most important)
kubectl logs <POD> --previous --tail=50

# Termination reason and exit code
kubectl describe pod <POD> | grep -A6 'Last State'
kubectl get pod <POD> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'

# Probe config and probe failures
kubectl describe pod <POD> | grep -E 'Liveness|Readiness|Startup'
kubectl get events --field-selector involvedObject.name=<POD> --sort-by='.lastTimestamp'

# Command/entrypoint and resource limits
kubectl get pod <POD> -o jsonpath='{.spec.containers[0].command} {.spec.containers[0].args}'
kubectl describe pod <POD> | grep -A3 'Limits'

# Confirm referenced config exists
kubectl get configmap,secret -n <NS>

# Restart the workload after a fix
kubectl rollout restart deployment <DEPLOY>

Conclusion

CrashLoopBackOff means a container keeps exiting shortly after start and the kubelet is now backing off between restarts. The usual root causes:

  1. The app crashes on startup from a bad config or a missing/empty environment variable.
  2. A misconfigured liveness probe repeatedly kills an otherwise healthy container.
  3. A required dependency (DB, cache, upstream) is unreachable at boot and the app exits.
  4. The container is OOMKilled (exit code 137) during initialization.
  5. A wrong command/entrypoint or missing binary makes the container exit immediately.
  6. A file or permission error stops the process from opening or writing a needed path.

Start with kubectl logs --previous and the Last State exit code — those two together identify almost every crash loop before you touch the manifest. For ad-hoc triage, the free incident assistant can turn a crash-loop describe dump into the likely root cause.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.