Kubernetes ImagePullBackOff Debugging Prompt
Diagnose `ImagePullBackOff` / `ErrImagePull` — wrong image name, private registry auth, imagePullSecrets, image signing/content trust, network reach to the registry.
- Target user
- Kubernetes engineers debugging pod startup failures
- Difficulty
- Beginner
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes engineer who has debugged hundreds of `ImagePullBackOff` errors across public registries, private registries (ECR, GAR, ACR, GHCR, Harbor, Quay), and air-gapped clusters. I will provide: - The pod that's stuck in `ImagePullBackOff` or `ErrImagePull` - `kubectl describe pod <pod>` output (Events section is the key part) - The image reference being pulled - The image pull secrets attached to the pod's ServiceAccount or pod spec - The container runtime (containerd / CRI-O) and version - Whether the cluster has airgap / proxy / image policy webhooks Your job: 1. **Decode the exact error message** in the Events section — they're informative: - `repository does not exist or may require 'docker login'` → image name typo OR private registry without credentials - `pull access denied for ... unauthorized: incorrect username or password` → credentials wrong/expired - `manifest unknown` → tag doesn't exist (typo, or tag was deleted) - `i/o timeout` / `dial tcp ... connect: connection refused` → network reach failure - `x509: certificate signed by unknown authority` → registry uses self-signed CA not trusted by node - `error validating signature` → image signing (cosign/notary) policy rejected - `too many requests` → registry rate limit (Docker Hub anonymous limit, ECR throttling) - `requested access to the resource is denied` (often + `unauthorized`) → IAM/registry RBAC issue - `BadRequest: ... no kernel modules` → not actually image; misleading runtime error 2. **Walk the pull chain**: - Pod spec → ServiceAccount imagePullSecrets → kubelet → container runtime → registry - `imagePullSecrets` MUST be in the same namespace as the pod - SA `imagePullSecrets` get merged with pod-level `spec.imagePullSecrets` - kubelet reads the secret on each pull (not cached aggressively) 3. **Common root causes by registry**: - **Docker Hub**: anonymous rate limit hit (100 pulls/6h per IP); authenticate to get higher limit - **ECR**: short-lived tokens — `aws-ecr-credential-provider` or sidecar-refreshing secret needed - **GAR (Google Artifact Registry)**: GCP IAM via Workload Identity; secret-based auth is fragile - **ACR**: managed identity preferred; AKS often has it automatic - **GHCR**: GitHub PAT with `read:packages` scope; package visibility matters - **Harbor / self-hosted**: TLS cert trust; `--registry-config` on containerd; HTTPS vs HTTP - **Air-gapped**: image must exist in the local mirror; `imagePullPolicy: IfNotPresent` to avoid pull 4. **Check `imagePullPolicy`**: - `Always` (default for `:latest` and no tag) — every pod start pulls - `IfNotPresent` — pull only if not cached - `Never` — only use local; useful for pre-loaded images on air-gapped nodes 5. **For private registries**, verify the imagePullSecret content: - Type must be `kubernetes.io/dockerconfigjson` - `.dockerconfigjson` field must contain a valid auth dict for the registry HOST in the image reference - `docker.io/library/nginx` and `nginx` both authenticate against `https://index.docker.io/v1/` — must match 6. **For self-signed CA registries**, the node's container runtime must trust the CA: - containerd: `/etc/containerd/certs.d/<host>/ca.crt` or insecure setting - CRI-O: `/etc/containers/certs.d/<host>/ca.crt` 7. **For signature policy failures**: `cosign verify` from the node identity; check policy controller (Kyverno / sigstore-policy-controller) logs. 8. Mark anything DESTRUCTIVE: tagging an `imagePullPolicy: Never` workload in production (locks pods to current cached image; new nodes fail), force-pull bypass scripts. --- Pod + namespace: [DESCRIBE] Image reference (full): [e.g., `myregistry.com/team/app:v1.2.3`] Container runtime: [containerd / CRI-O + version] `kubectl describe pod <pod>` (Events section especially): ``` [PASTE] ``` ServiceAccount and its imagePullSecrets: ```yaml [PASTE kubectl get sa <sa> -o yaml] ``` Pod-spec imagePullSecrets (if any): ```yaml [PASTE relevant part of pod yaml] ``` The secret content (sanitized — confirm registry HOST and type): ```yaml [PASTE kubectl get secret <ns>/<sec> -o yaml] ```
Why this prompt works
ImagePullBackOff has a short and predictable list of causes, but the user-facing message is often misleading (“repository does not exist” can mean wrong name OR private without auth). This prompt forces the model to decode the precise error string before suggesting fixes.
How to use it
- Always paste the Events section verbatim. The kubelet logs the exact registry response there — it’s diagnostic gold.
- Verify the image reference is exactly what you intend — typos in repo path or tag are the #1 cause.
- For private registries, confirm the secret content (sanitized) — wrong host key in the auth dict is invisible from the secret name alone.
- Check both namespace — secrets must be in the pod’s namespace; cross-namespace references don’t work.
Useful commands
# Diagnose
kubectl describe pod <pod>
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[*].state}'
kubectl get events --field-selector involvedObject.name=<pod>,type=Warning
# ServiceAccount and pod-level secrets
kubectl get sa <sa> -n <ns> -o yaml
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.imagePullSecrets}'
# Inspect a Docker pull secret
kubectl get secret <secret> -n <ns> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq
# Test pull from a node directly (if you can SSH)
sudo crictl pull <image> # containerd / CRI-O
sudo ctr -n=k8s.io images pull <image> # containerd directly
sudo podman pull <image> # if podman installed
# Test pull from inside cluster (debug pod)
kubectl run debug --rm -it --image=alpine --restart=Never -- \
sh -c "apk add curl; curl -v https://<registry>/v2/"
# Check registry endpoint reachability + TLS
kubectl run debug --rm -it --image=alpine --restart=Never -- \
sh -c "apk add openssl; openssl s_client -connect <registry>:443 -servername <registry> < /dev/null"
# Create a docker-registry secret
kubectl create secret docker-registry myregcred \
--docker-server=<registry-host> \
--docker-username=<user> \
--docker-password=<pass> \
--docker-email=<email>
# Attach to a ServiceAccount
kubectl patch sa default -n <ns> -p '{"imagePullSecrets":[{"name":"myregcred"}]}'
# Force re-pull (delete pods so they recreate; image cache on node may still be used)
kubectl rollout restart deploy <deploy>
Error message decoder
| Error fragment | Most likely cause | Fix |
|---|---|---|
repository does not exist or may require 'docker login' | Typo OR private without auth | Verify reference; add imagePullSecret |
pull access denied ... unauthorized | Credentials wrong/expired | Rotate secret; check token TTL |
manifest unknown | Tag doesn’t exist | Verify tag at registry; rebuild |
manifest for ... not found | Same as above | Same |
i/o timeout | Network reach | Check egress, NetworkPolicy, proxy |
connection refused | Wrong registry port or registry down | nc -vz <host> 443 from a debug pod |
x509: certificate signed by unknown authority | Self-signed CA not trusted | Install CA on nodes (containerd certs.d) |
too many requests | Rate limit | Authenticate; cache; mirror |
invalid reference format | Bad image string (e.g., uppercase tag) | Lowercase, no spaces |
Got permission denied while ... | Filesystem on node, not registry | Check kubelet write perms |
ECR credential pattern (preferred over static)
# kubelet config (cluster-admin level)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
imageCredentialProviders:
configFile: /etc/kubernetes/credential-providers.yaml
binDir: /usr/local/bin
# /etc/kubernetes/credential-providers.yaml
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
matchImages:
- "*.dkr.ecr.*.amazonaws.com"
defaultCacheDuration: "12h"
apiVersion: credentialprovider.kubelet.k8s.io/v1
Avoids ECR token-expiry pain.
Common findings this catches
- Image typo:
nginx:1.27exists,ngnix:1.27doesn’t — copy from registry directly. - Pod in
defaultnamespace, secret inkube-system→ ImagePullSecrets are namespace-scoped. Copy secret to pod’s namespace. - Secret type is
Opaqueinstead ofkubernetes.io/dockerconfigjson→ kubelet ignores; must be the right type. - Image references
docker.io/team/appbut secret authenticateshttps://team.docker.io/v1/→ auth dict key must match registry host. - Docker Hub anonymous pull limit hit on a cluster with many nodes → all nodes share IP if behind NAT; authenticate to raise limit.
imagePullPolicy: IfNotPresent+ mutable:latesttag → first pod pulled an old image; new pods on same node never pull fresh.
When to escalate
- Cluster-wide pull failures correlated with a registry outage — wait it out; don’t repoint to a different registry mid-incident.
- IAM/RBAC changes needed at the cloud level (ECR, GAR) — coordinate with the team that owns those identities.
- Image signing policy rejections — engage security/policy team; never disable the policy as a shortcut.
Related prompts
-
Dockerfile Security Review Prompt
AI security review of a Dockerfile — privilege, attack surface, secrets in layers, vulnerable bases, supply-chain risk.
-
Kubernetes Pod Troubleshooting Prompt
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.
-
Kubernetes Secrets Management Review Prompt
Audit how Kubernetes Secrets are stored, mounted, and rotated — flag base64-as-encryption myths, env-var leakage, and missing external-secrets / sealed-secrets / KMS integration.