Skip to content
DevOps AI ToolKit
Newsletter
All guides
Azure with AI By James Joyner IV · · 10 min read

Azure Error Guide: 'ImagePullBackOff' AKS Failing to Pull from ACR

Fix ImagePullBackOff / ErrImagePull 401 Unauthorized on AKS pulling from ACR: diagnose missing AcrPull role, kubelet identity, ACR firewall, image tags, and cross-tenant pulls.

  • #azure
  • #troubleshooting
  • #errors
  • #aks

Overview

ImagePullBackOff happens when a Kubernetes node cannot pull a container image and backs off after repeated ErrImagePull failures. On AKS pulling from Azure Container Registry (ACR), the most common cause is authorization: the cluster’s kubelet identity lacks AcrPull on the registry, so the anonymous/identity-based pull is rejected with 401 Unauthorized. The pod stays Pending/Waiting and never starts.

You will see this in kubectl get pods and the pod events:

NAME                        READY   STATUS             RESTARTS   AGE
orders-api-7d9f8c6b4-x2kqp  0/1     ImagePullBackOff   0          3m

And in kubectl describe pod, the underlying error:

  Warning  Failed     2m (x4 over 3m)   kubelet  Failed to pull image "prodacr.azurecr.io/orders-api:v1.4.2": failed to pull and unpack image "prodacr.azurecr.io/orders-api:v1.4.2": failed to resolve reference "prodacr.azurecr.io/orders-api:v1.4.2": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://prodacr.azurecr.io/oauth2/token?...: 401 Unauthorized
  Warning  Failed     2m (x4 over 3m)   kubelet  Error: ErrImagePull
  Warning  Failed     90s (x6 over 3m)  kubelet  Error: ImagePullBackOff

It occurs at pod scheduling time on the node, after the kubelet tries to authenticate to ACR. The exact cause — auth, image name, network, or identity — is in the describe event text, so read that first.

Symptoms

  • Pod stuck in ImagePullBackOff/ErrImagePull, never reaching Running.
  • kubectl describe pod shows 401 Unauthorized or not found from the ACR endpoint.
  • New deployments fail to pull while older cached images keep running.
  • Works in one cluster/subscription but not another.
kubectl describe pod orders-api-7d9f8c6b4-x2kqp -n prod | grep -A3 -i "failed\|error"
  Warning  Failed   2m   kubelet  Failed to pull image "prodacr.azurecr.io/orders-api:v1.4.2": ... 401 Unauthorized
  Warning  Failed   2m   kubelet  Error: ErrImagePull
  Warning  Failed   90s  kubelet  Error: ImagePullBackOff
az aks check-acr --resource-group rg-prod --name prod-aks \
  --acr prodacr.azurecr.io
[...]
Your cluster cannot pull images from prodacr.azurecr.io.
Error: failed to authorize: 401 Unauthorized. The kubelet identity may not have AcrPull on the registry.

az aks check-acr is the fastest authoritative test — it pulls a probe image using the cluster’s identity exactly as the kubelet would.

Common Root Causes

1. AKS not attached to ACR (missing AcrPull for kubelet identity)

The kubelet (node) managed identity needs the AcrPull role on the registry. If the cluster was never attached, pulls fail with 401.

# Find the kubelet identity object ID
KUBELET_OID=$(az aks show -g rg-prod -n prod-aks \
  --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n prodacr --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" \
  --query "[].roleDefinitionName" -o tsv
(empty)

No AcrPull assignment for the kubelet identity on the registry — attach the ACR to grant it.

2. Wrong image name or tag

A typo in the repository name or a tag that does not exist returns not found rather than 401. The image simply is not there.

# Does the tag exist in ACR?
az acr repository show-tags --name prodacr --repository orders-api \
  --orderby time_desc -o table
Result
--------
v1.4.1
v1.4.0
latest

The deployment requests v1.4.2, but ACR only has up to v1.4.1 — the tag was never pushed. The describe event will read manifest unknown / not found instead of 401.

3. ACR private endpoint / firewall blocking the node

If ACR is Premium with publicNetworkAccess disabled or network rules set to Deny, nodes that are not on the allowed VNet/private endpoint cannot reach the registry.

az acr show -n prodacr \
  --query "{sku:sku.name, publicAccess:publicNetworkAccess, default:networkRuleSet.defaultAction}" -o jsonc
{
  "sku": "Premium",
  "publicAccess": "Disabled",
  "default": "Deny"
}

With public access disabled, the node must resolve ACR through a private endpoint with correct DNS. A missing private DNS zone link causes the pull to fail at the network layer.

4. imagePullSecret missing or expired

If the deployment uses an explicit imagePullSecrets (instead of the attached managed identity), an absent or stale secret causes 401.

kubectl get pod orders-api-7d9f8c6b4-x2kqp -n prod \
  -o jsonpath='{.spec.imagePullSecrets[*].name}'
kubectl get secret acr-pull-secret -n prod -o jsonpath='{.type}'
acr-pull-secret
kubernetes.io/dockerconfigjson

If the referenced secret is missing, the kubelet has no credential and falls back to anonymous pull (401). If present but built from an expired SP/token, ACR rejects it. Prefer attaching the ACR over managing pull secrets.

5. Kubelet identity vs cluster identity confusion

AKS has two identities: the control-plane (cluster) identity and the kubelet (node) identity. Image pulls use the kubelet identity. Granting AcrPull to the cluster identity does nothing for pulls.

az aks show -g rg-prod -n prod-aks --query "identityProfile" -o jsonc
{
  "kubeletidentity": {
    "clientId": "aaaa1111-...",
    "objectId": "2c4e6a8b-1111-2222-3333-444455556666",
    "resourceId": ".../userAssignedIdentities/prod-aks-agentpool"
  }
}

The kubeletidentity.objectId is the principal that must hold AcrPull. If you assigned the role to the control-plane identity, move it to this object ID.

6. ACR in a different subscription or tenant

If the ACR lives in another subscription, the role assignment must target that registry’s full resource ID. Across tenants, the kubelet identity cannot be granted AcrPull at all without cross-tenant trust — pulls always 401.

# Confirm the ACR's subscription/tenant
az acr show -n prodacr --query "{id:id, sub:id}" -o tsv
az account list --query "[?isDefault].{name:name, tenant:tenantId, sub:id}" -o table
/subscriptions/22222222-.../resourceGroups/rg-registry/providers/Microsoft.ContainerRegistry/registries/prodacr
Name        Tenant                                Sub
----------  ------------------------------------  ------------------------------------
Contoso AKS aaaaaaaa-...                          11111111-...

The ACR is in subscription 2222... while AKS is in 1111.... The --attach-acr / role assignment must reference the ACR’s full ID across that subscription boundary; a same-subscription assumption fails.

Diagnostic Workflow

Step 1: Read the exact pull error from the pod

kubectl describe pod <POD> -n <NS> | grep -A4 -i "failed to pull"

401 Unauthorized = auth/identity issue; manifest unknown/not found = wrong image/tag; connection/timeout = network/firewall.

Step 2: Run the authoritative connectivity test

az aks check-acr --resource-group <RG> --name <AKS> --acr <REGISTRY>.azurecr.io

This pulls a probe image using the kubelet identity and prints whether it is auth, image, or network.

Step 3: Verify the kubelet identity holds AcrPull

KUBELET_OID=$(az aks show -g <RG> -n <AKS> --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n <REGISTRY> --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" --query "[].roleDefinitionName" -o tsv

Empty output means the role is missing — attach the ACR in Step 5.

Step 4: Confirm the image exists and check ACR network rules

az acr repository show-tags --name <REGISTRY> --repository <REPO> --orderby time_desc -o table
az acr show -n <REGISTRY> --query "{publicAccess:publicNetworkAccess, default:networkRuleSet.defaultAction}" -o jsonc

Make sure the tag is present and the node’s network path to ACR is allowed.

Step 5: Attach the ACR (or fix the gap) and re-roll the pods

az aks update -g <RG> -n <AKS> --attach-acr <REGISTRY>
# Wait for role propagation, then force a fresh pull
kubectl rollout restart deployment/<DEPLOY> -n <NS>
kubectl get pods -n <NS> -w

Example Root Cause Analysis

A new deployment orders-api:v1.4.2 rolls out to prod-aks and every pod lands in ImagePullBackOff. Older pods on the previous tag keep running fine.

The pod event names the failure:

Failed to pull image "prodacr.azurecr.io/orders-api:v1.4.2": ... failed to authorize: ... 401 Unauthorized

A 401 (not not found) points at authorization, so the image exists but the pull is unauthorized. Running the built-in check confirms it:

az aks check-acr --resource-group rg-prod --name prod-aks --acr prodacr.azurecr.io
Your cluster cannot pull images from prodacr.azurecr.io.
Error: failed to authorize: 401 Unauthorized.

Checking which identity should hold the role and whether it does:

KUBELET_OID=$(az aks show -g rg-prod -n prod-aks --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n prodacr --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" --query "[].roleDefinitionName" -o tsv
(empty)

The kubelet identity has no AcrPull on the registry. The registry was recently recreated in a different resource group during a migration, so the old attachment no longer applies — but the older running pods kept their already-pulled layers cached, which is why only the new image failed.

Fix: re-attach the ACR to the cluster and restart the deployment:

az aks update -g rg-prod -n prod-aks --attach-acr prodacr
kubectl rollout restart deployment/orders-api -n prod

After role propagation the kubelet authenticates, pulls v1.4.2, and the pods reach Running.

Prevention Best Practices

  • Attach ACR to AKS with az aks update --attach-acr rather than hand-managing imagePullSecrets; it grants AcrPull to the correct kubelet identity automatically.
  • After any ACR recreation/migration, re-run az aks check-acr from your runbook — cached images hide the broken auth until the next new tag.
  • Grant AcrPull to the kubeletidentity.objectId, never the control-plane identity; the two are easy to confuse.
  • For private ACR, verify the private DNS zone is linked to the node VNet so nodes resolve the registry before relying on firewall allow lists.
  • Keep CI image tags immutable and push before deploy, so not found failures are caught in the pipeline, not on the node.
  • For ad-hoc triage, the free incident assistant can classify a pull event as auth vs image vs network and suggest the next command. See more in Azure guides.

Quick Command Reference

# Read the exact pull failure
kubectl describe pod <POD> -n <NS> | grep -A4 -i "failed to pull"

# Authoritative ACR connectivity test (uses kubelet identity)
az aks check-acr --resource-group <RG> --name <AKS> --acr <REGISTRY>.azurecr.io

# Which identity does the kubelet use, and does it hold AcrPull?
az aks show -g <RG> -n <AKS> --query "identityProfile" -o jsonc
KUBELET_OID=$(az aks show -g <RG> -n <AKS> --query "identityProfile.kubeletidentity.objectId" -o tsv)
ACR_ID=$(az acr show -n <REGISTRY> --query id -o tsv)
az role assignment list --assignee "$KUBELET_OID" --scope "$ACR_ID" -o table

# Does the image/tag exist?
az acr repository show-tags --name <REGISTRY> --repository <REPO> --orderby time_desc -o table

# ACR network posture
az acr show -n <REGISTRY> --query "{sku:sku.name, publicAccess:publicNetworkAccess, default:networkRuleSet.defaultAction}" -o jsonc

# Inspect imagePullSecrets on the pod
kubectl get pod <POD> -n <NS> -o jsonpath='{.spec.imagePullSecrets[*].name}'

# Attach ACR and re-roll
az aks update -g <RG> -n <AKS> --attach-acr <REGISTRY>
kubectl rollout restart deployment/<DEPLOY> -n <NS>

Conclusion

ImagePullBackOff on AKS pulling from ACR is almost always an authorization gap surfaced as 401 Unauthorized. The usual root causes:

  1. The cluster is not attached to ACR, so the kubelet identity has no AcrPull role.
  2. The image name or tag is wrong and does not exist in the registry.
  3. An ACR private endpoint/firewall blocks the node’s network path.
  4. An explicit imagePullSecret is missing or built from an expired credential.
  5. AcrPull was granted to the control-plane identity instead of the kubelet identity.
  6. The ACR lives in a different subscription or tenant, so the role assignment never applies.

Read the describe event first — 401 means fix the identity/attachment, not found means fix the tag, and a timeout means fix the network — then re-run az aks check-acr to confirm before re-rolling the pods.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.