Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Beginner ClaudeChatGPT

Kubernetes ImagePullBackOff Debugging Prompt

Diagnose `ImagePullBackOff` / `ErrImagePull` — wrong image name, private registry auth, imagePullSecrets, image signing/content trust, network reach to the registry.

Target user
Kubernetes engineers debugging pod startup failures
Difficulty
Beginner
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes engineer who has debugged hundreds of `ImagePullBackOff` errors across public registries, private registries (ECR, GAR, ACR, GHCR, Harbor, Quay), and air-gapped clusters.

I will provide:
- The pod that's stuck in `ImagePullBackOff` or `ErrImagePull`
- `kubectl describe pod <pod>` output (Events section is the key part)
- The image reference being pulled
- The image pull secrets attached to the pod's ServiceAccount or pod spec
- The container runtime (containerd / CRI-O) and version
- Whether the cluster has airgap / proxy / image policy webhooks

Your job:

1. **Decode the exact error message** in the Events section — they're informative:
   - `repository does not exist or may require 'docker login'` → image name typo OR private registry without credentials
   - `pull access denied for ... unauthorized: incorrect username or password` → credentials wrong/expired
   - `manifest unknown` → tag doesn't exist (typo, or tag was deleted)
   - `i/o timeout` / `dial tcp ... connect: connection refused` → network reach failure
   - `x509: certificate signed by unknown authority` → registry uses self-signed CA not trusted by node
   - `error validating signature` → image signing (cosign/notary) policy rejected
   - `too many requests` → registry rate limit (Docker Hub anonymous limit, ECR throttling)
   - `requested access to the resource is denied` (often + `unauthorized`) → IAM/registry RBAC issue
   - `BadRequest: ... no kernel modules` → not actually image; misleading runtime error
2. **Walk the pull chain**:
   - Pod spec → ServiceAccount imagePullSecrets → kubelet → container runtime → registry
   - `imagePullSecrets` MUST be in the same namespace as the pod
   - SA `imagePullSecrets` get merged with pod-level `spec.imagePullSecrets`
   - kubelet reads the secret on each pull (not cached aggressively)
3. **Common root causes by registry**:
   - **Docker Hub**: anonymous rate limit hit (100 pulls/6h per IP); authenticate to get higher limit
   - **ECR**: short-lived tokens — `aws-ecr-credential-provider` or sidecar-refreshing secret needed
   - **GAR (Google Artifact Registry)**: GCP IAM via Workload Identity; secret-based auth is fragile
   - **ACR**: managed identity preferred; AKS often has it automatic
   - **GHCR**: GitHub PAT with `read:packages` scope; package visibility matters
   - **Harbor / self-hosted**: TLS cert trust; `--registry-config` on containerd; HTTPS vs HTTP
   - **Air-gapped**: image must exist in the local mirror; `imagePullPolicy: IfNotPresent` to avoid pull
4. **Check `imagePullPolicy`**:
   - `Always` (default for `:latest` and no tag) — every pod start pulls
   - `IfNotPresent` — pull only if not cached
   - `Never` — only use local; useful for pre-loaded images on air-gapped nodes
5. **For private registries**, verify the imagePullSecret content:
   - Type must be `kubernetes.io/dockerconfigjson`
   - `.dockerconfigjson` field must contain a valid auth dict for the registry HOST in the image reference
   - `docker.io/library/nginx` and `nginx` both authenticate against `https://index.docker.io/v1/` — must match
6. **For self-signed CA registries**, the node's container runtime must trust the CA:
   - containerd: `/etc/containerd/certs.d/<host>/ca.crt` or insecure setting
   - CRI-O: `/etc/containers/certs.d/<host>/ca.crt`
7. **For signature policy failures**: `cosign verify` from the node identity; check policy controller (Kyverno / sigstore-policy-controller) logs.
8. Mark anything DESTRUCTIVE: tagging an `imagePullPolicy: Never` workload in production (locks pods to current cached image; new nodes fail), force-pull bypass scripts.

---

Pod + namespace: [DESCRIBE]
Image reference (full): [e.g., `myregistry.com/team/app:v1.2.3`]
Container runtime: [containerd / CRI-O + version]
`kubectl describe pod <pod>` (Events section especially):
```
[PASTE]
```
ServiceAccount and its imagePullSecrets:
```yaml
[PASTE kubectl get sa <sa> -o yaml]
```
Pod-spec imagePullSecrets (if any):
```yaml
[PASTE relevant part of pod yaml]
```
The secret content (sanitized — confirm registry HOST and type):
```yaml
[PASTE kubectl get secret <ns>/<sec> -o yaml]
```

Why this prompt works

ImagePullBackOff has a short and predictable list of causes, but the user-facing message is often misleading (“repository does not exist” can mean wrong name OR private without auth). This prompt forces the model to decode the precise error string before suggesting fixes.

How to use it

  1. Always paste the Events section verbatim. The kubelet logs the exact registry response there — it’s diagnostic gold.
  2. Verify the image reference is exactly what you intend — typos in repo path or tag are the #1 cause.
  3. For private registries, confirm the secret content (sanitized) — wrong host key in the auth dict is invisible from the secret name alone.
  4. Check both namespace — secrets must be in the pod’s namespace; cross-namespace references don’t work.

Useful commands

# Diagnose
kubectl describe pod <pod>
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[*].state}'
kubectl get events --field-selector involvedObject.name=<pod>,type=Warning

# ServiceAccount and pod-level secrets
kubectl get sa <sa> -n <ns> -o yaml
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.imagePullSecrets}'

# Inspect a Docker pull secret
kubectl get secret <secret> -n <ns> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq

# Test pull from a node directly (if you can SSH)
sudo crictl pull <image>          # containerd / CRI-O
sudo ctr -n=k8s.io images pull <image>   # containerd directly
sudo podman pull <image>          # if podman installed

# Test pull from inside cluster (debug pod)
kubectl run debug --rm -it --image=alpine --restart=Never -- \
  sh -c "apk add curl; curl -v https://<registry>/v2/"

# Check registry endpoint reachability + TLS
kubectl run debug --rm -it --image=alpine --restart=Never -- \
  sh -c "apk add openssl; openssl s_client -connect <registry>:443 -servername <registry> < /dev/null"

# Create a docker-registry secret
kubectl create secret docker-registry myregcred \
  --docker-server=<registry-host> \
  --docker-username=<user> \
  --docker-password=<pass> \
  --docker-email=<email>

# Attach to a ServiceAccount
kubectl patch sa default -n <ns> -p '{"imagePullSecrets":[{"name":"myregcred"}]}'

# Force re-pull (delete pods so they recreate; image cache on node may still be used)
kubectl rollout restart deploy <deploy>

Error message decoder

Error fragmentMost likely causeFix
repository does not exist or may require 'docker login'Typo OR private without authVerify reference; add imagePullSecret
pull access denied ... unauthorizedCredentials wrong/expiredRotate secret; check token TTL
manifest unknownTag doesn’t existVerify tag at registry; rebuild
manifest for ... not foundSame as aboveSame
i/o timeoutNetwork reachCheck egress, NetworkPolicy, proxy
connection refusedWrong registry port or registry downnc -vz <host> 443 from a debug pod
x509: certificate signed by unknown authoritySelf-signed CA not trustedInstall CA on nodes (containerd certs.d)
too many requestsRate limitAuthenticate; cache; mirror
invalid reference formatBad image string (e.g., uppercase tag)Lowercase, no spaces
Got permission denied while ...Filesystem on node, not registryCheck kubelet write perms

ECR credential pattern (preferred over static)

# kubelet config (cluster-admin level)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
imageCredentialProviders:
  configFile: /etc/kubernetes/credential-providers.yaml
  binDir: /usr/local/bin
# /etc/kubernetes/credential-providers.yaml
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
  matchImages:
  - "*.dkr.ecr.*.amazonaws.com"
  defaultCacheDuration: "12h"
  apiVersion: credentialprovider.kubelet.k8s.io/v1

Avoids ECR token-expiry pain.

Common findings this catches

  • Image typo: nginx:1.27 exists, ngnix:1.27 doesn’t — copy from registry directly.
  • Pod in default namespace, secret in kube-system → ImagePullSecrets are namespace-scoped. Copy secret to pod’s namespace.
  • Secret type is Opaque instead of kubernetes.io/dockerconfigjson → kubelet ignores; must be the right type.
  • Image references docker.io/team/app but secret authenticates https://team.docker.io/v1/ → auth dict key must match registry host.
  • Docker Hub anonymous pull limit hit on a cluster with many nodes → all nodes share IP if behind NAT; authenticate to raise limit.
  • imagePullPolicy: IfNotPresent + mutable :latest tag → first pod pulled an old image; new pods on same node never pull fresh.

When to escalate

  • Cluster-wide pull failures correlated with a registry outage — wait it out; don’t repoint to a different registry mid-incident.
  • IAM/RBAC changes needed at the cloud level (ECR, GAR) — coordinate with the team that owns those identities.
  • Image signing policy rejections — engage security/policy team; never disable the policy as a shortcut.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.