Azure Error Guide: 'AllocationFailed' Unable to Allocate Compute Capacity
Fix the Azure AllocationFailed / ZonalAllocationFailed error when capacity is unavailable for a VM size in a region or zone: change SKU, zone, or constraints.
- #azure
- #troubleshooting
- #errors
- #compute
Exact Error Message
The classic region-wide variant looks like this:
(AllocationFailed) Allocation failed. We do not have sufficient capacity for the
requested VM size in this region. Read more about improving likelihood of allocation
success at https://aka.ms/allocation-guidance.
Code: AllocationFailed
Message: Allocation failed. We do not have sufficient capacity for the requested VM
size in this region.
When you pin a VM to a specific availability zone, you get the zonal variant:
(ZonalAllocationFailed) Allocation failed. We do not have sufficient capacity for the
requested VM size in this zone. Read more about improving likelihood of allocation
success at https://aka.ms/allocation-guidance.
Code: ZonalAllocationFailed
Message: Allocation failed. We do not have sufficient capacity for the requested VM
size in this zone.
And when too many constraints are combined, the platform returns an over-constrained variant:
(OverconstrainedAllocationRequest) Allocation failed. VM(s) with the following
constraints cannot be allocated, because the constraints are not supportable or
because capacity is unavailable. Please remove some constraints and retry.
Constraints applied: Networking Constraints (Accelerated Networking), Availability
Zone, Proximity Placement Group, VM Size (Standard_D8s_v5).
Code: OverconstrainedAllocationRequest
What the Error Means
AllocationFailed is a capacity error, not a permissions or quota error. When you start or create a VM, Azure must find a physical host in the target region (and zone, if specified) that can satisfy every constraint of your request: the exact VM SKU, networking features, placement groups, and so on. If no host with free capacity matches, the allocation fails and the control plane returns this error.
Critically, this is about Microsoft’s available physical capacity at that moment, in that location, for that hardware family. It is transient and location-specific. The same request that fails in eastus zone 1 may succeed seconds later in zone 2, in a different region, or with a slightly different SKU. It has nothing to do with your subscription’s quota (which is a separate OperationNotAllowed error) or your role assignments.
Common Causes
- Capacity shortage for a popular or constrained SKU. Newer or GPU/HPC families (N-series, H-series, specialized
_v5/_v6SKUs) and large memory-optimized sizes are frequently constrained in busy regions. - Pinning to a single availability zone. A zone is a smaller capacity pool than the whole region, so
ZonalAllocationFailedhappens far more often than the region-wide error. - Combining too many constraints. Proximity placement group + specific zone + a constrained SKU + accelerated networking narrows the eligible host set to almost nothing, triggering
OverconstrainedAllocationRequest. - Restarting a deallocated VM into a now-full cluster. When you stop (deallocate) a VM, you release its host. Starting it later requires re-allocation into the same cluster, which may have filled up in the meantime.
- Availability set fragmentation. Adding a VM to an existing availability set forces placement within the same cluster; if that cluster is full for your SKU, allocation fails even when the region has capacity elsewhere.
- Spot eviction with no capacity. A Spot VM that was evicted cannot be re-allocated until on-demand capacity frees up for that SKU in that region.
How to Reproduce the Error
The error is non-deterministic because it depends on live capacity, but you can reliably surface it by maximizing constraints against a hot region. Request a large or specialized SKU, pin it to a single zone, and add accelerated networking plus a proximity placement group during a busy period:
# Illustration only - this is the kind of over-constrained request that triggers it.
# A constrained SKU + single zone + PPG + accelerated NIC in a busy region.
az vm create \
--resource-group rg-demo \
--name vm-capacity-test \
--image Ubuntu2204 \
--size Standard_D8s_v5 \
--zone 1 \
--ppg my-ppg \
--accelerated-networking true \
--location eastus
The more constraints you stack on a single popular SKU, the more likely Azure cannot satisfy them all simultaneously.
Diagnostic Commands
All of the following are read-only. They tell you which SKUs and zones actually have capacity exposed and whether restrictions apply, before you retry a deployment.
# Which D-family SKUs exist in the region, in which zones, with what restrictions?
az vm list-skus --location eastus --size Standard_D \
--query "[].{name:name, zones:locationInfo[0].zones, restrictions:restrictions}" \
--output table
# Check SKU availability for a specific zone (restrictions show NotAvailableForSubscription).
az vm list-skus --location eastus --zone 1 --size Standard_D \
--query "[].{name:name, restrictions:restrictions}" --output table
# Confirm this is NOT a quota issue - shows current vs limit per family.
az vm list-usage --location eastus --output table
# List regions you can deploy to, so you can pick an alternate location.
az account list-locations \
--query "[].{name:name, displayName:displayName}" --output table
If restrictions is empty for your SKU in the target zone but allocation still fails, you are looking at a pure live-capacity shortage rather than a subscription-level block.
Step-by-Step Resolution
- Try a different VM size or family. Switch from a constrained SKU to a sibling in the same family (for example
Standard_D8s_v5toStandard_D8s_v4, or to a_v3size). Older generations almost always have more capacity headroom. - Try a different region or zone. If you hit
ZonalAllocationFailed, drop the zone pin or try zones 2 and 3. If the whole region is short, move to a nearby region (eastus2,westus3). - Remove over-constraints. For
OverconstrainedAllocationRequest, strip constraints one at a time: drop the proximity placement group, remove the zone pin, or disable accelerated networking, then retry. - Deallocate, then start or recreate so the platform re-places. For a stopped VM that won’t start, fully deallocate it and start again so Azure can pick a different host. If that still fails, redeploy or recreate the VM so it is placed into a cluster with free capacity.
- Use flexible orchestration and capacity reservations. Virtual Machine Scale Sets with Flexible orchestration let Azure spread instances across the region instead of pinning a single cluster. For guaranteed capacity, create an on-demand capacity reservation for the SKU/zone ahead of time, then deploy against it.
- Retry later. Capacity is reclaimed continuously. An automated retry with exponential backoff often succeeds within minutes for transient shortages.
- Consider a Spot fallback for non-critical workloads. If you only need best-effort compute, Spot VMs can pull from a different capacity pool, though they remain subject to eviction.
For a broader playbook on automating these retries and capacity checks across environments, see our cloud operations guides.
Prevention and Best Practices
- Standardize on well-supplied SKUs and keep a documented fallback family per region for your IaC modules.
- Reserve capacity for critical tiers using on-demand capacity reservations so production never races for hosts.
- Avoid unnecessary zone pinning. Let the platform choose unless you have a strict zonal HA design.
- Prefer Flexible-orchestration scale sets over availability sets for new workloads to reduce cluster fragmentation.
- Build retry-with-backoff and multi-region fallback into your provisioning automation rather than failing hard on the first allocation error.
- Validate SKU/zone availability with
az vm list-skusin your pipeline before attempting large rollouts.
Related Errors
- ZonalAllocationFailed — the zone-scoped form of this error; capacity is unavailable in the specific availability zone you pinned. Remove or change the zone.
- OverconstrainedAllocationRequest — too many simultaneous constraints (PPG, zone, networking, SKU) cannot be satisfied together. Relax constraints.
- SkuNotAvailable — the VM size is not offered in that region/zone for your subscription at all (a restriction), distinct from a transient capacity shortage. Pick a supported SKU or region.
- OperationNotAllowed — typically a quota limit: your subscription’s vCPU allowance for that family is exceeded. Request a quota increase rather than changing capacity.
- AllocationFailed (resize variant) — resizing a running VM to a larger SKU can fail because the current cluster lacks capacity for the target size; stop-deallocate the VM first, then resize so Azure can move it.
Frequently Asked Questions
What is the difference between AllocationFailed and a quota error?
AllocationFailed means Microsoft has no physical capacity available right now for your request in that location. A quota error (OperationNotAllowed) means your subscription’s configured limit for that VM family is reached. Quota is fixed by requesting an increase; capacity is fixed by changing SKU, region, zone, or retrying. Use az vm list-usage to rule out quota first.
Why does a stopped VM fail to start with AllocationFailed? Deallocating a VM releases its physical host back to the pool. Starting it later requires re-allocating into the same cluster, which may have filled up while the VM was stopped — especially for constrained SKUs or zone-pinned VMs. Deallocate and start again to let Azure re-place it, or recreate it in a cluster with free capacity.
Do capacity reservations guarantee my VM always allocates? An on-demand capacity reservation reserves compute capacity for a specific SKU, region, and optionally zone before you deploy, so subsequent deployments against it are protected from transient shortages. You pay for the reserved capacity whether or not VMs occupy it, which is the trade-off for guaranteed availability for critical workloads.
How long should I wait before retrying? Capacity churns constantly, so an exponential backoff starting around 30-60 seconds and capping at a few minutes is usually effective for transient shortages. If retries keep failing for 15+ minutes, the region is genuinely constrained for that SKU — switch SKU, zone, or region instead of waiting.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.