Azure Error Guide: 'ImagePullBackOff' AKS Failing to Pull from ACR
Fix ImagePullBackOff / ErrImagePull 401 Unauthorized on AKS pulling from ACR: diagnose missing AcrPull role, kubelet identity, ACR firewall, image tags, and cross-tenant pulls.
- #azure
- #troubleshooting
- #errors
- #aks
Overview
ImagePullBackOff happens when a Kubernetes node cannot pull a container image and backs off after repeated ErrImagePull failures. On AKS pulling from Azure Container Registry (ACR), the most common cause is authorization: the cluster’s kubelet identity lacks AcrPull on the registry, so the anonymous/identity-based pull is rejected with 401 Unauthorized. The pod stays Pending/Waiting and never starts.
You will see this in kubectl get pods and the pod events:
NAME READY STATUS RESTARTS AGE
orders-api-7d9f8c6b4-x2kqp 0/1 ImagePullBackOff 0 3m
And in kubectl describe pod, the underlying error:
Warning Failed 2m (x4 over 3m) kubelet Failed to pull image "prodacr.azurecr.io/orders-api:v1.4.2": failed to pull and unpack image "prodacr.azurecr.io/orders-api:v1.4.2": failed to resolve reference "prodacr.azurecr.io/orders-api:v1.4.2": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://prodacr.azurecr.io/oauth2/token?...: 401 Unauthorized
Warning Failed 2m (x4 over 3m) kubelet Error: ErrImagePull
Warning Failed 90s (x6 over 3m) kubelet Error: ImagePullBackOff
It occurs at pod scheduling time on the node, after the kubelet tries to authenticate to ACR. The exact cause — auth, image name, network, or identity — is in the describe event text, so read that first.
Symptoms
- Pod stuck in
ImagePullBackOff/ErrImagePull, never reachingRunning. kubectl describe podshows401 Unauthorizedornot foundfrom the ACR endpoint.- New deployments fail to pull while older cached images keep running.
- Works in one cluster/subscription but not another.
kubectl describe pod orders-api-7d9f8c6b4-x2kqp -n prod | grep -A3 -i "failed\|error"
Warning Failed 2m kubelet Failed to pull image "prodacr.azurecr.io/orders-api:v1.4.2": ... 401 Unauthorized
Warning Failed 2m kubelet Error: ErrImagePull
Warning Failed 90s kubelet Error: ImagePullBackOff
az aks check-acr --resource-group rg-prod --name prod-aks \
--acr prodacr.azurecr.io
[...]
Your cluster cannot pull images from prodacr.azurecr.io.
Error: failed to authorize: 401 Unauthorized. The kubelet identity may not have AcrPull on the registry.
az aks check-acr is the fastest authoritative test — it pulls a probe image using the cluster’s identity exactly as the kubelet would.
Common Root Causes
1. AKS not attached to ACR (missing AcrPull for kubelet identity)
The kubelet (node) managed identity needs the AcrPull role on the registry. If the cluster was never attached, pulls fail with 401.
# Find the kubelet identity object ID
KUBELET_OID=$(az aks show -g rg-prod -n prod-aks \
--query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n prodacr --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" \
--query "[].roleDefinitionName" -o tsv
(empty)
No AcrPull assignment for the kubelet identity on the registry — attach the ACR to grant it.
2. Wrong image name or tag
A typo in the repository name or a tag that does not exist returns not found rather than 401. The image simply is not there.
# Does the tag exist in ACR?
az acr repository show-tags --name prodacr --repository orders-api \
--orderby time_desc -o table
Result
--------
v1.4.1
v1.4.0
latest
The deployment requests v1.4.2, but ACR only has up to v1.4.1 — the tag was never pushed. The describe event will read manifest unknown / not found instead of 401.
3. ACR private endpoint / firewall blocking the node
If ACR is Premium with publicNetworkAccess disabled or network rules set to Deny, nodes that are not on the allowed VNet/private endpoint cannot reach the registry.
az acr show -n prodacr \
--query "{sku:sku.name, publicAccess:publicNetworkAccess, default:networkRuleSet.defaultAction}" -o jsonc
{
"sku": "Premium",
"publicAccess": "Disabled",
"default": "Deny"
}
With public access disabled, the node must resolve ACR through a private endpoint with correct DNS. A missing private DNS zone link causes the pull to fail at the network layer.
4. imagePullSecret missing or expired
If the deployment uses an explicit imagePullSecrets (instead of the attached managed identity), an absent or stale secret causes 401.
kubectl get pod orders-api-7d9f8c6b4-x2kqp -n prod \
-o jsonpath='{.spec.imagePullSecrets[*].name}'
kubectl get secret acr-pull-secret -n prod -o jsonpath='{.type}'
acr-pull-secret
kubernetes.io/dockerconfigjson
If the referenced secret is missing, the kubelet has no credential and falls back to anonymous pull (401). If present but built from an expired SP/token, ACR rejects it. Prefer attaching the ACR over managing pull secrets.
5. Kubelet identity vs cluster identity confusion
AKS has two identities: the control-plane (cluster) identity and the kubelet (node) identity. Image pulls use the kubelet identity. Granting AcrPull to the cluster identity does nothing for pulls.
az aks show -g rg-prod -n prod-aks --query "identityProfile" -o jsonc
{
"kubeletidentity": {
"clientId": "aaaa1111-...",
"objectId": "2c4e6a8b-1111-2222-3333-444455556666",
"resourceId": ".../userAssignedIdentities/prod-aks-agentpool"
}
}
The kubeletidentity.objectId is the principal that must hold AcrPull. If you assigned the role to the control-plane identity, move it to this object ID.
6. ACR in a different subscription or tenant
If the ACR lives in another subscription, the role assignment must target that registry’s full resource ID. Across tenants, the kubelet identity cannot be granted AcrPull at all without cross-tenant trust — pulls always 401.
# Confirm the ACR's subscription/tenant
az acr show -n prodacr --query "{id:id, sub:id}" -o tsv
az account list --query "[?isDefault].{name:name, tenant:tenantId, sub:id}" -o table
/subscriptions/22222222-.../resourceGroups/rg-registry/providers/Microsoft.ContainerRegistry/registries/prodacr
Name Tenant Sub
---------- ------------------------------------ ------------------------------------
Contoso AKS aaaaaaaa-... 11111111-...
The ACR is in subscription 2222... while AKS is in 1111.... The --attach-acr / role assignment must reference the ACR’s full ID across that subscription boundary; a same-subscription assumption fails.
Diagnostic Workflow
Step 1: Read the exact pull error from the pod
kubectl describe pod <POD> -n <NS> | grep -A4 -i "failed to pull"
401 Unauthorized = auth/identity issue; manifest unknown/not found = wrong image/tag; connection/timeout = network/firewall.
Step 2: Run the authoritative connectivity test
az aks check-acr --resource-group <RG> --name <AKS> --acr <REGISTRY>.azurecr.io
This pulls a probe image using the kubelet identity and prints whether it is auth, image, or network.
Step 3: Verify the kubelet identity holds AcrPull
KUBELET_OID=$(az aks show -g <RG> -n <AKS> --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n <REGISTRY> --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" --query "[].roleDefinitionName" -o tsv
Empty output means the role is missing — attach the ACR in Step 5.
Step 4: Confirm the image exists and check ACR network rules
az acr repository show-tags --name <REGISTRY> --repository <REPO> --orderby time_desc -o table
az acr show -n <REGISTRY> --query "{publicAccess:publicNetworkAccess, default:networkRuleSet.defaultAction}" -o jsonc
Make sure the tag is present and the node’s network path to ACR is allowed.
Step 5: Attach the ACR (or fix the gap) and re-roll the pods
az aks update -g <RG> -n <AKS> --attach-acr <REGISTRY>
# Wait for role propagation, then force a fresh pull
kubectl rollout restart deployment/<DEPLOY> -n <NS>
kubectl get pods -n <NS> -w
Example Root Cause Analysis
A new deployment orders-api:v1.4.2 rolls out to prod-aks and every pod lands in ImagePullBackOff. Older pods on the previous tag keep running fine.
The pod event names the failure:
Failed to pull image "prodacr.azurecr.io/orders-api:v1.4.2": ... failed to authorize: ... 401 Unauthorized
A 401 (not not found) points at authorization, so the image exists but the pull is unauthorized. Running the built-in check confirms it:
az aks check-acr --resource-group rg-prod --name prod-aks --acr prodacr.azurecr.io
Your cluster cannot pull images from prodacr.azurecr.io.
Error: failed to authorize: 401 Unauthorized.
Checking which identity should hold the role and whether it does:
KUBELET_OID=$(az aks show -g rg-prod -n prod-aks --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n prodacr --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" --query "[].roleDefinitionName" -o tsv
(empty)
The kubelet identity has no AcrPull on the registry. The registry was recently recreated in a different resource group during a migration, so the old attachment no longer applies — but the older running pods kept their already-pulled layers cached, which is why only the new image failed.
Fix: re-attach the ACR to the cluster and restart the deployment:
az aks update -g rg-prod -n prod-aks --attach-acr prodacr
kubectl rollout restart deployment/orders-api -n prod
After role propagation the kubelet authenticates, pulls v1.4.2, and the pods reach Running.
Prevention Best Practices
- Attach ACR to AKS with
az aks update --attach-acrrather than hand-managingimagePullSecrets; it grantsAcrPullto the correct kubelet identity automatically. - After any ACR recreation/migration, re-run
az aks check-acrfrom your runbook — cached images hide the broken auth until the next new tag. - Grant
AcrPullto thekubeletidentity.objectId, never the control-plane identity; the two are easy to confuse. - For private ACR, verify the private DNS zone is linked to the node VNet so nodes resolve the registry before relying on firewall allow lists.
- Keep CI image tags immutable and push before deploy, so
not foundfailures are caught in the pipeline, not on the node. - For ad-hoc triage, the free incident assistant can classify a pull event as auth vs image vs network and suggest the next command. See more in Azure guides.
Quick Command Reference
# Read the exact pull failure
kubectl describe pod <POD> -n <NS> | grep -A4 -i "failed to pull"
# Authoritative ACR connectivity test (uses kubelet identity)
az aks check-acr --resource-group <RG> --name <AKS> --acr <REGISTRY>.azurecr.io
# Which identity does the kubelet use, and does it hold AcrPull?
az aks show -g <RG> -n <AKS> --query "identityProfile" -o jsonc
KUBELET_OID=$(az aks show -g <RG> -n <AKS> --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n <REGISTRY> --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" -o table
# Does the image/tag exist?
az acr repository show-tags --name <REGISTRY> --repository <REPO> --orderby time_desc -o table
# ACR network posture
az acr show -n <REGISTRY> --query "{sku:sku.name, publicAccess:publicNetworkAccess, default:networkRuleSet.defaultAction}" -o jsonc
# Inspect imagePullSecrets on the pod
kubectl get pod <POD> -n <NS> -o jsonpath='{.spec.imagePullSecrets[*].name}'
# Attach ACR and re-roll
az aks update -g <RG> -n <AKS> --attach-acr <REGISTRY>
kubectl rollout restart deployment/<DEPLOY> -n <NS>
Conclusion
ImagePullBackOff on AKS pulling from ACR is almost always an authorization gap surfaced as 401 Unauthorized. The usual root causes:
- The cluster is not attached to ACR, so the kubelet identity has no
AcrPullrole. - The image name or tag is wrong and does not exist in the registry.
- An ACR private endpoint/firewall blocks the node’s network path.
- An explicit
imagePullSecretis missing or built from an expired credential. AcrPullwas granted to the control-plane identity instead of the kubelet identity.- The ACR lives in a different subscription or tenant, so the role assignment never applies.
Read the describe event first — 401 means fix the identity/attachment, not found means fix the tag, and a timeout means fix the network — then re-run az aks check-acr to confirm before re-rolling the pods.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.