Skip to content
DevOps AI ToolKit
Newsletter
All guides
AWS with AI By James Joyner IV · · 9 min read

AWS Error Guide: 'InsufficientInstanceCapacity' EC2 Launch Capacity Failures

Fix the EC2 InsufficientInstanceCapacity error: diagnose AZ capacity shortfalls, rigid instance types, capacity reservations, placement groups, and ASG strategies.

  • #aws
  • #troubleshooting
  • #errors
  • #ec2

Overview

InsufficientInstanceCapacity means AWS does not currently have enough physical capacity for the exact instance type in the exact Availability Zone you requested. It is not a quota or permissions problem — it is a supply problem on AWS’s side at that moment. The request is rejected outright; the instance is not queued.

You see it from run-instances, an Auto Scaling Group, a Spot request, or a Fleet:

An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 2): We currently do not have sufficient m6i.24xlarge capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get m6i.24xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1c.

It occurs most with large/special instance types, in popular AZs, during regional demand spikes, when an Auto Scaling Group is pinned to one AZ and one type, or when a placement group or Capacity Reservation forces a narrow placement.

Symptoms

  • RunInstances fails with InsufficientInstanceCapacity naming the type and AZ.
  • An ASG cannot scale out; activity history shows repeated capacity failures.
  • A Spot request stays open with status capacity-not-available.
  • The same launch succeeds in a different AZ or with a different size.
aws ec2 run-instances --instance-type m6i.24xlarge \
  --image-id ami-0abcd1234ef567890 --subnet-id subnet-0aaa1111 --count 1
An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation: We currently do not have sufficient m6i.24xlarge capacity in the Availability Zone you requested (us-east-1a).
aws autoscaling describe-scaling-activities --auto-scaling-group-name web-asg \
  --max-records 3 --query 'Activities[].StatusMessage' --output text
Could not launch On-Demand Instances. InsufficientInstanceCapacity - We currently do not have sufficient m6i.24xlarge capacity in the Availability Zone you requested (us-east-1a). Launching EC2 instance failed.

Common Root Causes

1. Single AZ pinned for a scarce type

The request (or the ASG/subnet) targets one AZ, and that AZ is out of the chosen type even though others have it.

aws ec2 describe-instance-type-offerings --location-type availability-zone \
  --filters Name=instance-type,Values=m6i.24xlarge \
  --query 'InstanceTypeOfferings[].Location' --output text
us-east-1b	us-east-1c	us-east-1d

The type is offered in 1b/1c/1d but you asked for us-east-1a — spread across AZs.

2. A single rigid instance type with no fallback

Pinning one exact type means a momentary shortage of that one SKU blocks the launch. Allowing a flexible set of compatible types lets EC2 satisfy the request from whatever is available.

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names web-asg \
  --query 'AutoScalingGroups[0].MixedInstancesPolicy.LaunchTemplate.Overrides[].InstanceType' \
  --output text
None

None means no instance-type overrides — the ASG can only launch the one type in the launch template.

3. A Capacity Reservation or placement constraint forcing the AZ

A targeted Capacity Reservation, a cluster placement group, or a dedicated tenancy constrains placement to one AZ/host pool that may be exhausted.

aws ec2 describe-capacity-reservations \
  --query 'CapacityReservations[?State==`active`].[InstanceType,AvailabilityZone,AvailableInstanceCount]' \
  --output text
m6i.24xlarge	us-east-1a	0

The reservation in us-east-1a has 0 available, and the launch targets it — it cannot place there.

4. Spot capacity not available for the pool

A Spot request targets a price/AZ/type pool with no spare capacity; the request idles with capacity-not-available rather than erroring loudly.

aws ec2 describe-spot-instance-requests \
  --query 'SpotInstanceRequests[].[SpotInstanceRequestId,State,Status.Code]' --output text
sir-abc12345	open	capacity-not-available

capacity-not-available means widen the Spot pools (more types/AZs) so the request can be filled.

5. A large/specialized type during regional demand

GPU, high-memory, and very large general types are scarcer. During regional spikes (events, ML training surges) even multi-AZ launches can fail for the biggest SKUs.

aws ec2 describe-instance-type-offerings --location-type availability-zone \
  --filters Name=instance-type,Values=p4d.24xlarge \
  --query 'InstanceTypeOfferings[].Location' --output text
us-east-1a

A type offered in only one AZ regionwide has no fallback AZ — consider another region or an On-Demand Capacity Reservation.

6. ASG launch template hard-codes a subnet/AZ

The ASG references a single subnet (one AZ) instead of subnets across multiple AZs, so it can never fail over.

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names web-asg \
  --query 'AutoScalingGroups[0].[VPCZoneIdentifier,AvailabilityZones]' --output text
subnet-0aaa1111	us-east-1a

A single subnet/AZ means every scale-out attempt hits the same pool. Add subnets in other AZs.

Diagnostic Workflow

Step 1: Capture the exact type and AZ from the message

aws ec2 run-instances --instance-type <TYPE> --image-id <AMI> \
  --subnet-id <SUBNET> --count 1 2>&1 | grep -oE 'sufficient .*'

The message names the scarce type and the AZ, and often suggests AZs that do have capacity.

Step 2: Find which AZs actually offer the type

aws ec2 describe-instance-type-offerings --location-type availability-zone \
  --filters Name=instance-type,Values=<TYPE> \
  --query 'InstanceTypeOfferings[].Location' --output text

If the type is offered in other AZs, launching there (or omitting the AZ) is the immediate fix.

Step 3: Rule out reservations and placement constraints

aws ec2 describe-capacity-reservations \
  --query 'CapacityReservations[?State==`active`].[InstanceType,AvailabilityZone,AvailableInstanceCount]' \
  --output text
aws ec2 describe-placement-groups --query 'PlacementGroups[].[GroupName,Strategy]' --output text

A cluster placement group or a 0-available reservation forces a narrow pool.

Step 4: For ASGs, check AZ spread and type flexibility

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names <ASG> \
  --query 'AutoScalingGroups[0].[VPCZoneIdentifier,MixedInstancesPolicy.LaunchTemplate.Overrides[].InstanceType]' \
  --output json

A single subnet and no overrides means the ASG cannot adapt to capacity.

Step 5: Relaunch with flexibility (multi-AZ or instance-type set)

# Let EC2 choose the AZ
aws ec2 run-instances --instance-type <TYPE> --image-id <AMI> --count 1 \
  --subnet-id <ANY_SUBNET_IN_VPC>

Omitting the AZ (or supplying subnets across AZs) and allowing several compatible types resolves most occurrences immediately.

Example Root Cause Analysis

An overnight batch ASG, render-asg, repeatedly failed to scale out, blocking the render queue. Activity history showed InsufficientInstanceCapacity for c6i.32xlarge in us-east-1a.

The ASG was pinned to one subnet:

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names render-asg \
  --query 'AutoScalingGroups[0].[VPCZoneIdentifier,MixedInstancesPolicy]' --output json
[
    "subnet-0aaa1111",
    null
]

One subnet (us-east-1a) and no mixed-instances policy — a single rigid type in a single AZ. Checking offerings:

aws ec2 describe-instance-type-offerings --location-type availability-zone \
  --filters Name=instance-type,Values=c6i.32xlarge \
  --query 'InstanceTypeOfferings[].Location' --output text
us-east-1b	us-east-1c	us-east-1d

The type had capacity in three other AZs. Fix: add subnets in us-east-1b/1c/1d to the ASG and attach a mixed-instances policy with c6i.32xlarge, c6a.32xlarge, and m6i.32xlarge as overrides, plus capacity-optimized allocation. The next scale-out launched in us-east-1c and the queue drained.

Prevention Best Practices

  • Spread ASGs and subnets across at least three AZs; never pin a scaling group to one AZ for a scarce type.
  • Use a mixed-instances policy with several compatible types and capacity-optimized allocation so EC2 can fill from whatever pool has supply.
  • For guaranteed launches of large/GPU types, buy On-Demand Capacity Reservations (or use a Capacity Block) in the AZ you need ahead of demand.
  • Treat InsufficientInstanceCapacity as retryable with backoff in launch automation — capacity frees up continuously, so a delayed retry across AZs often succeeds.
  • Diversify Spot pools across many type/AZ combinations so a single pool’s shortage does not stall the fleet.
  • For quickly correlating capacity failures across an ASG’s activity history, the free incident assistant can summarize the scarce type and AZ. More EC2 walkthroughs are in the AWS guides.

Quick Command Reference

# Which AZs offer the type?
aws ec2 describe-instance-type-offerings --location-type availability-zone \
  --filters Name=instance-type,Values=<TYPE> \
  --query 'InstanceTypeOfferings[].Location' --output text

# ASG capacity failures
aws autoscaling describe-scaling-activities --auto-scaling-group-name <ASG> \
  --max-records 5 --query 'Activities[].StatusMessage' --output text

# AZ spread and type flexibility of an ASG
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names <ASG> \
  --query 'AutoScalingGroups[0].[VPCZoneIdentifier,MixedInstancesPolicy]' --output json

# Capacity reservations and placement groups
aws ec2 describe-capacity-reservations \
  --query 'CapacityReservations[?State==`active`].[InstanceType,AvailabilityZone,AvailableInstanceCount]' --output text
aws ec2 describe-placement-groups --query 'PlacementGroups[].[GroupName,Strategy]' --output text

# Spot request status
aws ec2 describe-spot-instance-requests \
  --query 'SpotInstanceRequests[].[SpotInstanceRequestId,State,Status.Code]' --output text

Conclusion

InsufficientInstanceCapacity means AWS lacks physical supply for that exact type in that exact AZ right now. The usual root causes:

  1. A single AZ pinned for a type that is scarce there but available elsewhere.
  2. One rigid instance type with no compatible fallbacks.
  3. A Capacity Reservation, placement group, or dedicated tenancy forcing a narrow pool.
  4. A Spot pool with no available capacity (capacity-not-available).
  5. A large or specialized type during a regional demand spike.
  6. An ASG hard-coded to a single subnet/AZ.

Spread across AZs, allow a flexible set of types, and retry with backoff — capacity is transient, so flexibility is the durable fix.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.