AWS Error Guide: 'CannotPullContainerError' ECS Task Image

Overview

CannotPullContainerError is the stopped-task reason ECS reports when the container runtime on the host could not download the image referenced by a task definition. The agent asked the runtime to pull the image, the pull failed, and the task moved to STOPPED before any container started. The failure happens at image-pull time, so application logs are empty — the diagnosis lives in the task’s stoppedReason.

You see it in the stopped task or service events:

CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref "123456789012.dkr.ecr.us-east-1.amazonaws.com/api:v42": failed to do request: Head "https://123456789012.dkr.ecr.us-east-1.amazonaws.com/v2/api/manifests/v42": dial tcp 10.0.3.17:443: i/o timeout

It occurs on every task launch: a service deployment, a scheduled task, or a run-task. The cause is almost always one of authentication, the image/tag not existing, or no network path from the task’s subnet to the registry.

Symptoms

Tasks cycle PROVISIONING to STOPPED with stoppedReason containing CannotPullContainerError.
A service deployment never reaches steady state; events repeat the pull failure.
The same image runs locally but fails in a private subnet.
stoppedReason includes i/o timeout, manifest unknown, 403 Forbidden, or toomanyrequests.

aws ecs describe-tasks --cluster prod --tasks <TASK_ARN> \
  --query 'tasks[0].[lastStatus,stoppedReason]' --output text

STOPPED	CannotPullContainerError: failed to resolve ref "123456789012.dkr.ecr.us-east-1.amazonaws.com/api:v42": ... dial tcp 10.0.3.17:443: i/o timeout

aws ecs describe-services --cluster prod --services api \
  --query 'services[0].events[0:3].message' --output text

(service api) failed to launch a task with (error CannotPullContainerError: ...).

Common Root Causes

1. The execution role cannot authenticate to ECR

ECS uses the task execution role to pull from ECR. If it lacks ecr:GetAuthorizationToken / ecr:BatchGetImage, the pull returns 403.

aws ecs describe-task-definition --task-definition api \
  --query 'taskDefinition.executionRoleArn' --output text
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
  --action-names ecr:GetAuthorizationToken ecr:BatchGetImage \
  --query 'EvaluationResults[].[EvalActionName,EvalDecision]' --output text

ecr:GetAuthorizationToken	implicitDeny
ecr:BatchGetImage	allowed

implicitDeny on GetAuthorizationToken blocks every ECR pull — attach AmazonECSTaskExecutionRolePolicy or the missing action.

2. The image tag does not exist in the repository

The task definition references a tag that was never pushed (typo, failed CI push, or the tag was deleted). The registry returns manifest unknown.

aws ecr describe-images --repository-name api \
  --query 'reverse(sort_by(imageDetails,&imagePushedAt))[0:5].imageTags' --output text

['v41', 'latest']	['v40']	['v39']

The task wants v42 but the newest tag is v41 — the image was never pushed. Fix the tag or push the build.

3. No network route from a private subnet to the registry

A task in a private subnet with no NAT gateway and no ECR VPC endpoints cannot reach ECR or S3 (ECR layers live in S3). The pull times out (i/o timeout).

aws ec2 describe-vpc-endpoints \
  --filters Name=vpc-id,Values=vpc-0abc1234 \
  --query 'VpcEndpoints[].ServiceName' --output text

com.amazonaws.us-east-1.ssm

Only the SSM endpoint exists — no ecr.api, ecr.dkr, or s3 endpoint and (presumably) no NAT. Add the three endpoints or a NAT route.

4. Wrong region or account in the image URI

The image URI points to ECR in a different region or account than the task is running in, so the host resolves a registry it cannot reach or is not authorized for.

aws ecs describe-task-definition --task-definition api \
  --query 'taskDefinition.containerDefinitions[0].image' --output text

123456789012.dkr.ecr.us-west-2.amazonaws.com/api:v42

The task runs in us-east-1 but the URI references us-west-2 — use the same-region repo or set up cross-region/replication access.

5. Docker Hub or public registry rate limit

Pulling a public image (nginx, redis) anonymously hits Docker Hub’s pull-rate limit, returning toomanyrequests.

aws ecs describe-tasks --cluster prod --tasks <TASK_ARN> \
  --query 'tasks[0].stoppedReason' --output text

CannotPullContainerError: ... toomanyrequests: You have reached your pull rate limit.

Use ECR Public, mirror the image into ECR, or supply registry credentials via repositoryCredentials.

6. Security group or NACL blocks egress 443

The task’s security group or the subnet NACL blocks outbound HTTPS, so even with a NAT/endpoint the TLS connection to the registry fails.

aws ec2 describe-security-groups --group-ids sg-0task1234 \
  --query 'SecurityGroups[0].IpPermissionsEgress[].[IpProtocol,FromPort,ToPort]' --output text

tcp	443	443

If this shows no 443/all egress rule (or the NACL denies it), the pull cannot establish a connection.

Diagnostic Workflow

Step 1: Read the precise stoppedReason

aws ecs describe-tasks --cluster <CLUSTER> --tasks <TASK_ARN> \
  --query 'tasks[0].[stopCode,stoppedReason]' --output text

The tail of the message classifies it: i/o timeout = network, 403 Forbidden = auth, manifest unknown = missing tag, toomanyrequests = rate limit.

Step 2: Verify the image and tag exist

aws ecr describe-images --repository-name <REPO> \
  --image-ids imageTag=<TAG> --query 'imageDetails[0].imageTags' --output text

A ImageNotFoundException here means the tag is the problem, not the network.

Step 3: Confirm the execution role’s ECR permissions

aws iam simulate-principal-policy \
  --policy-source-arn <EXECUTION_ROLE_ARN> \
  --action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer

Any implicitDeny/explicitDeny explains a 403.

Step 4: Check the network path from the task’s subnet

aws ec2 describe-vpc-endpoints --filters Name=vpc-id,Values=<VPC_ID> \
  --query 'VpcEndpoints[].ServiceName' --output text
aws ec2 describe-route-tables --filters Name=association.subnet-id,Values=<SUBNET> \
  --query 'RouteTables[].Routes[?contains(to_string(@),`nat`)]' --output json

A private subnet needs either a NAT route or the ecr.api, ecr.dkr, and s3 (gateway) endpoints.

Step 5: Re-run the task and watch the new reason

aws ecs run-task --cluster <CLUSTER> --task-definition <TD> \
  --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[<SUBNET>],securityGroups=[<SG>]}"
aws ecs describe-tasks --cluster <CLUSTER> --tasks <NEW_TASK_ARN> \
  --query 'tasks[0].stoppedReason' --output text

A changed (or empty) reason confirms the fix or points to the next layer.

Example Root Cause Analysis

A Fargate service, payments, deployed a new task definition and immediately began failing with CannotPullContainerError. The stopped reason ended in i/o timeout, pointing at the network rather than auth.

The service had recently been moved to private subnets. Checking endpoints:

aws ec2 describe-vpc-endpoints --filters Name=vpc-id,Values=vpc-0abc1234 \
  --query 'VpcEndpoints[].ServiceName' --output text

com.amazonaws.us-east-1.ecr.api	com.amazonaws.us-east-1.ecr.dkr

The ecr.api and ecr.dkr interface endpoints existed, but the S3 gateway endpoint was missing. ECR stores image layers in S3, so the manifest HEAD resolved but the layer download timed out.

aws ec2 describe-vpc-endpoints --filters Name=vpc-id,Values=vpc-0abc1234 \
  --query 'VpcEndpoints[?contains(ServiceName,`s3`)]' --output text

Empty — no S3 endpoint and no NAT. Fix: create the S3 gateway endpoint and associate it with the private subnets’ route tables.

aws ec2 create-vpc-endpoint --vpc-id vpc-0abc1234 \
  --service-name com.amazonaws.us-east-1.s3 --vpc-endpoint-type Gateway \
  --route-table-ids rtb-0priv5678

The next deployment pulled the image and reached steady state.

Prevention Best Practices

Always attach AmazonECSTaskExecutionRolePolicy (or its ECR actions) to the task execution role — it is the role ECS uses to pull, distinct from the task role.
For tasks in private subnets, provision all three pull endpoints: ecr.api, ecr.dkr, and the s3 gateway endpoint; the S3 endpoint is the one most often forgotten.
Pin task definitions to immutable image digests or build-stamped tags, not latest, so a missing/overwritten tag never silently breaks a deploy.
Mirror public base images into ECR (or use ECR Public) to avoid Docker Hub toomanyrequests rate limits in CI and at scale.
Keep the image URI’s region and account aligned with where tasks run, or configure cross-region replication.
For fast classification of a stoppedReason, the free incident assistant can tell auth, missing-tag, and network failures apart from the message. More ECS walkthroughs are in the AWS guides.

Quick Command Reference

# Precise stop reason
aws ecs describe-tasks --cluster <CLUSTER> --tasks <TASK_ARN> \
  --query 'tasks[0].[stopCode,stoppedReason]' --output text

# Does the tag exist?
aws ecr describe-images --repository-name <REPO> --image-ids imageTag=<TAG> \
  --query 'imageDetails[0].imageTags' --output text

# Execution role ECR permissions
aws iam simulate-principal-policy --policy-source-arn <EXEC_ROLE> \
  --action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer

# VPC endpoints present (need ecr.api, ecr.dkr, s3)
aws ec2 describe-vpc-endpoints --filters Name=vpc-id,Values=<VPC_ID> \
  --query 'VpcEndpoints[].ServiceName' --output text

# Image URI in the task definition
aws ecs describe-task-definition --task-definition <TD> \
  --query 'taskDefinition.containerDefinitions[0].image' --output text

Conclusion

CannotPullContainerError means the container runtime could not download the image before the task could start. The usual root causes:

The task execution role lacks ecr:GetAuthorizationToken / ecr:BatchGetImage.
The referenced image tag does not exist in the repository.
No network route (NAT or ECR + S3 endpoints) from a private subnet to the registry.
A wrong region or account in the image URI.
A Docker Hub / public registry rate limit (toomanyrequests).
A security group or NACL blocking outbound 443.

Read the tail of stoppedReason first — it classifies the failure as auth, missing-tag, or network — then fix that one layer and re-run the task.

AWS Error Guide: 'CannotPullContainerError' ECS Task Image Pull Failures