You are a senior Kubernetes engineer who has debugged hundreds of `ImagePullBackOff` errors across public registries, private registries (ECR, GAR, ACR, GHCR, Harbor, Quay), and air-gapped clusters. I will provide: - The pod that's stuck in `ImagePullBackOff` or `ErrImagePull` - `kubectl describe pod <pod>` output (Events section is the key part) - The image reference being pulled - The image pull secrets attached to the pod's ServiceAccount or pod spec - The container runtime (containerd / CRI-O) and version - Whether the cluster has airgap / proxy / image policy webhooks Your job: 1. **Decode the exact error message** in the Events section — they're informative: - `repository does not exist or may require 'docker login'` → image name typo OR private registry without credentials - `pull access denied for ... unauthorized: incorrect username or password` → credentials wrong/expired - `manifest unknown` → tag doesn't exist (typo, or tag was deleted) - `i/o timeout` / `dial tcp ... connect: connection refused` → network reach failure - `x509: certificate signed by unknown authority` → registry uses self-signed CA not trusted by node - `error validating signature` → image signing (cosign/notary) policy rejected - `too many requests` → registry rate limit (Docker Hub anonymous limit, ECR throttling) - `requested access to the resource is denied` (often + `unauthorized`) → IAM/registry RBAC issue - `BadRequest: ... no kernel modules` → not actually image; misleading runtime error 2. **Walk the pull chain**: - Pod spec → ServiceAccount imagePullSecrets → kubelet → container runtime → registry - `imagePullSecrets` MUST be in the same namespace as the pod - SA `imagePullSecrets` get merged with pod-level `spec.imagePullSecrets` - kubelet reads the secret on each pull (not cached aggressively) 3. **Common root causes by registry**: - **Docker Hub**: anonymous rate limit hit (100 pulls/6h per IP); authenticate to get higher limit - **ECR**: short-lived tokens — `aws-ecr-credential-provider` or sidecar-refreshing secret needed - **GAR (Google Artifact Registry)**: GCP IAM via Workload Identity; secret-based auth is fragile - **ACR**: managed identity preferred; AKS often has it automatic - **GHCR**: GitHub PAT with `read:packages` scope; package visibility matters - **Harbor / self-hosted**: TLS cert trust; `--registry-config` on containerd; HTTPS vs HTTP - **Air-gapped**: image must exist in the local mirror; `imagePullPolicy: IfNotPresent` to avoid pull 4. **Check `imagePullPolicy`**: - `Always` (default for `:latest` and no tag) — every pod start pulls - `IfNotPresent` — pull only if not cached - `Never` — only use local; useful for pre-loaded images on air-gapped nodes 5. **For private registries**, verify the imagePullSecret content: - Type must be `kubernetes.io/dockerconfigjson` - `.dockerconfigjson` field must contain a valid auth dict for the registry HOST in the image reference - `docker.io/library/nginx` and `nginx` both authenticate against `https://index.docker.io/v1/` — must match 6. **For self-signed CA registries**, the node's container runtime must trust the CA: - containerd: `/etc/containerd/certs.d/<host>/ca.crt` or insecure setting - CRI-O: `/etc/containers/certs.d/<host>/ca.crt` 7. **For signature policy failures**: `cosign verify` from the node identity; check policy controller (Kyverno / sigstore-policy-controller) logs. 8. Mark anything DESTRUCTIVE: tagging an `imagePullPolicy: Never` workload in production (locks pods to current cached image; new nodes fail), force-pull bypass scripts. --- Pod + namespace: [DESCRIBE] Image reference (full): [e.g., `myregistry.com/team/app:v1.2.3`] Container runtime: [containerd / CRI-O + version] `kubectl describe pod <pod>` (Events section especially): ``` [PASTE] ``` ServiceAccount and its imagePullSecrets: ```yaml [PASTE kubectl get sa <sa> -o yaml] ``` Pod-spec imagePullSecrets (if any): ```yaml [PASTE relevant part of pod yaml] ``` The secret content (sanitized — confirm registry HOST and type): ```yaml [PASTE kubectl get secret <ns>/<sec> -o yaml] ```

Why this prompt works

ImagePullBackOff has a short and predictable list of causes, but the user-facing message is often misleading (“repository does not exist” can mean wrong name OR private without auth). This prompt forces the model to decode the precise error string before suggesting fixes.

How to use it

Always paste the Events section verbatim. The kubelet logs the exact registry response there — it’s diagnostic gold.
Verify the image reference is exactly what you intend — typos in repo path or tag are the #1 cause.
For private registries, confirm the secret content (sanitized) — wrong host key in the auth dict is invisible from the secret name alone.
Check both namespace — secrets must be in the pod’s namespace; cross-namespace references don’t work.

Useful commands

# Diagnose
kubectl describe pod <pod>
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[*].state}'
kubectl get events --field-selector involvedObject.name=<pod>,type=Warning

# ServiceAccount and pod-level secrets
kubectl get sa <sa> -n <ns> -o yaml
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.imagePullSecrets}'

# Inspect a Docker pull secret
kubectl get secret <secret> -n <ns> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq

# Test pull from a node directly (if you can SSH)
sudo crictl pull <image>          # containerd / CRI-O
sudo ctr -n=k8s.io images pull <image>   # containerd directly
sudo podman pull <image>          # if podman installed

# Test pull from inside cluster (debug pod)
kubectl run debug --rm -it --image=alpine --restart=Never -- \
  sh -c "apk add curl; curl -v https://<registry>/v2/"

# Check registry endpoint reachability + TLS
kubectl run debug --rm -it --image=alpine --restart=Never -- \
  sh -c "apk add openssl; openssl s_client -connect <registry>:443 -servername <registry> < /dev/null"

# Create a docker-registry secret
kubectl create secret docker-registry myregcred \
  --docker-server=<registry-host> \
  --docker-username=<user> \
  --docker-password=<pass> \
  --docker-email=<email>

# Attach to a ServiceAccount
kubectl patch sa default -n <ns> -p '{"imagePullSecrets":[{"name":"myregcred"}]}'

# Force re-pull (delete pods so they recreate; image cache on node may still be used)
kubectl rollout restart deploy <deploy>

Error message decoder

Error fragment	Most likely cause	Fix
`repository does not exist or may require 'docker login'`	Typo OR private without auth	Verify reference; add imagePullSecret
`pull access denied ... unauthorized`	Credentials wrong/expired	Rotate secret; check token TTL
`manifest unknown`	Tag doesn’t exist	Verify tag at registry; rebuild
`manifest for ... not found`	Same as above	Same
`i/o timeout`	Network reach	Check egress, NetworkPolicy, proxy
`connection refused`	Wrong registry port or registry down	`nc -vz <host> 443` from a debug pod
`x509: certificate signed by unknown authority`	Self-signed CA not trusted	Install CA on nodes (containerd certs.d)
`too many requests`	Rate limit	Authenticate; cache; mirror
`invalid reference format`	Bad image string (e.g., uppercase tag)	Lowercase, no spaces
`Got permission denied while ...`	Filesystem on node, not registry	Check kubelet write perms

ECR credential pattern (preferred over static)

# kubelet config (cluster-admin level)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
imageCredentialProviders:
  configFile: /etc/kubernetes/credential-providers.yaml
  binDir: /usr/local/bin

# /etc/kubernetes/credential-providers.yaml
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
  matchImages:
  - "*.dkr.ecr.*.amazonaws.com"
  defaultCacheDuration: "12h"
  apiVersion: credentialprovider.kubelet.k8s.io/v1

Avoids ECR token-expiry pain.

Common findings this catches

Image typo: nginx:1.27 exists, ngnix:1.27 doesn’t — copy from registry directly.
Pod in default namespace, secret in kube-system → ImagePullSecrets are namespace-scoped. Copy secret to pod’s namespace.
Secret type is Opaque instead of kubernetes.io/dockerconfigjson → kubelet ignores; must be the right type.
Image references docker.io/team/app but secret authenticates https://team.docker.io/v1/ → auth dict key must match registry host.
Docker Hub anonymous pull limit hit on a cluster with many nodes → all nodes share IP if behind NAT; authenticate to raise limit.
imagePullPolicy: IfNotPresent + mutable :latest tag → first pod pulled an old image; new pods on same node never pull fresh.

When to escalate

Cluster-wide pull failures correlated with a registry outage — wait it out; don’t repoint to a different registry mid-incident.
IAM/RBAC changes needed at the cloud level (ECR, GAR) — coordinate with the team that owns those identities.
Image signing policy rejections — engage security/policy team; never disable the policy as a shortcut.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

Kubernetes ImagePullBackOff Debugging Prompt

Why this prompt works

How to use it

Useful commands

Error message decoder

ECR credential pattern (preferred over static)

Common findings this catches

When to escalate

Related prompts

Kubernetes Pod Troubleshooting Prompt

Kubernetes Secrets Management Review Prompt

Dockerfile Security Review Prompt

Kubernetes Controller Leader Election Debug Prompt

Reading prompts? Get all 500 in one free PDF