Skip to content
DevOps AI ToolKit
Newsletter
All guides
GCP with AI By James Joyner IV · · 9 min read

GCP Error Guide: 'ZONE_RESOURCE_POOL_EXHAUSTED' Zone Out of Capacity

Fix ZONE_RESOURCE_POOL_EXHAUSTED on Compute Engine: understand why a zone has no capacity for your machine type, fail over to other zones, and avoid stuck scale-ups.

  • #gcp
  • #troubleshooting
  • #errors
  • #compute

Exact Error Message

ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - The zone 'projects/acme-prod-platform/zones/us-central1-a' does not have
   enough resources available to fulfill the request. Try a different zone,
   or try again later.

errorCode: ZONE_RESOURCE_POOL_EXHAUSTED
HTTP status: 503
Operation type: insert

The same condition appears in Managed Instance Group autoscaler events and GKE node-pool scale-up logs as ZONE_RESOURCE_POOL_EXHAUSTED with the node never appearing.

What the Error Means

ZONE_RESOURCE_POOL_EXHAUSTED means Google Cloud temporarily has no physical capacity of the requested resource (a specific machine type, CPU platform, or GPU) in that exact zone at that moment. Your quota is fine; the hardware simply is not available right now.

This is fundamentally different from a quota error. Quotas are a policy limit you can raise. ZONE_RESOURCE_POOL_EXHAUSTED is a real-world supply constraint inside one Google data center zone. No quota increase will fix it. It is also transient: the same request may succeed minutes later or in a neighboring zone.

It typically returns HTTP 503 (service-side, retryable) rather than 403, which is your first clue that this is capacity, not permissions or quota.

Common Causes

  • Popular or scarce machine types. New or specialized families (latest-gen c3, c4, large m3 memory-optimized) and GPU types (a100, h100) frequently run short in a given zone.
  • Single-zone deployments. Pinning a MIG, GKE node pool, or VM to one zone removes any fallback when that zone is constrained.
  • Spot/Preemptible demand. Spot capacity is the first to be exhausted because it draws from spare pool that Google reclaims first.
  • Large simultaneous bursts. Asking for many instances of the same type at once can exhaust the local pool even when single instances would succeed.
  • Reservations not used. If you have a capacity reservation but the instance is not configured to consume it, you compete for on-demand capacity and can still be denied.
  • Time-of-day / regional demand spikes. Capacity is shared across all customers; busy periods in popular zones (often us-central1-a, europe-west1-b) make exhaustion more likely.

How to Reproduce the Error

You cannot deterministically force this since it depends on Google’s live inventory, but the conditions that trigger it are reproducible: request a scarce machine type, pinned to a single popular zone, in bulk.

# Requesting many instances of a scarce/large type in one zone raises the
# odds of ZONE_RESOURCE_POOL_EXHAUSTED:
for i in 1 2 3 4 5 6 7 8; do
  gcloud compute instances create batch-gpu-$i \
    --machine-type=a2-highgpu-1g \
    --accelerator=type=nvidia-tesla-a100,count=1 \
    --zone=us-central1-a \
    --maintenance-policy=TERMINATE \
    --project=acme-prod-platform
done

When the zone’s A100 pool is depleted, the failing instance returns the 503 above.

Diagnostic Commands

All read-only. Confirm it is capacity (not quota), and find zones where the machine type is available.

# Confirm the machine type even exists/is offered in the zone
gcloud compute machine-types describe a2-highgpu-1g \
  --zone=us-central1-a \
  --project=acme-prod-platform \
  --format="value(name, guestCpus, memoryMb)"

# List which zones in the region offer this machine type at all
gcloud compute machine-types list \
  --filter="name=a2-highgpu-1g AND zone~us-central1" \
  --project=acme-prod-platform \
  --format="table(name, zone)"

# Rule out a quota problem (capacity errors are NOT quota errors)
gcloud compute regions describe us-central1 \
  --project=acme-prod-platform \
  --format="table(quotas.metric, quotas.limit, quotas.usage)"

# Review the failed operation's exact error code
gcloud compute operations list \
  --filter="operationType=insert AND status=DONE" \
  --project=acme-prod-platform \
  --format="table(name, targetLink.basename(), error.errors[0].code)"

# Check any existing reservations you might be able to consume
gcloud compute reservations list \
  --project=acme-prod-platform \
  --format="table(name, zone, specificReservation.count, status)"

# Confirm the active project
gcloud config get-value project

Step-by-Step Resolution

1. Retry with backoff. Because the error is transient and 503, an immediate retry strategy (exponential backoff, a few attempts over several minutes) often succeeds.

2. Try a different zone in the same region. This is the fastest reliable fix. Capacity is per-zone, so a sibling zone usually has stock.

gcloud compute instances create batch-gpu-1 \
  --machine-type=a2-highgpu-1g \
  --accelerator=type=nvidia-tesla-a100,count=1 \
  --zone=us-central1-b \
  --maintenance-policy=TERMINATE \
  --project=acme-prod-platform

3. Let regional MIGs pick the zone for you. Use a regional Managed Instance Group so the autoscaler spreads across all zones and routes around an exhausted one automatically.

gcloud compute instance-groups managed create web-rmig \
  --template=web-template \
  --size=3 \
  --region=us-central1 \
  --project=acme-prod-platform

4. Reserve capacity ahead of time. For predictable critical workloads, create a capacity reservation so the hardware is held for you, then configure instances to consume it.

gcloud compute reservations create a100-pool \
  --vm-count=4 --machine-type=a2-highgpu-1g \
  --zone=us-central1-b --project=acme-prod-platform

5. Relax constraints. Fall back to an adjacent machine generation (for example c2 instead of c3) or use on-demand instead of Spot for the affected request when capacity matters more than price.

6. Route persistent failures to on-call. If a production scale-up keeps failing, escalate it through your Incident Response dashboard so the workload is failed over rather than left stuck pending.

Prevention and Best Practices

  • Default to multi-zone. Use regional MIGs and multi-zone GKE node pools so no single zone is a hard dependency.
  • Reserve scarce hardware. GPUs and the newest CPU families are the most exhaustion-prone; back critical jobs with capacity reservations.
  • Implement retry-and-failover in clients. On 503 / ZONE_RESOURCE_POOL_EXHAUSTED, retry with backoff, then automatically try the next zone in a preference list.
  • Pick less-contended zones. The lowest-letter zone (-a) in big regions is often the busiest; -c/-d can have more headroom.
  • Stagger large batches. Provision big fleets in smaller waves rather than all at once to avoid depleting a single zone’s pool.
  • Quota 'CPUS' exceeded — a policy limit you can raise, not a capacity problem.
  • QUOTA_EXCEEDED for GPUs — you are out of GPU quota, distinct from the zone being out of GPU stock.
  • RESOURCE_OPERATION_RATE_EXCEEDED — too many operations too quickly; rate, not capacity.
  • IP_SPACE_EXHAUSTED — out of IP addresses, common on GKE, unrelated to compute pool capacity.
  • More in the GCP error guides.

Frequently Asked Questions

Will a quota increase fix ZONE_RESOURCE_POOL_EXHAUSTED? No. This error means Google has no physical capacity in that zone for your request. Quotas are policy limits and are unrelated. Requesting a quota increase will not help; switching zones or retrying will.

Is this error permanent? No, it is transient. Zone capacity fluctuates minute to minute as other customers create and delete resources. The same request often succeeds shortly after, or immediately in a different zone.

Why does one zone fail while another succeeds? Capacity is tracked per zone, per machine type. Each zone is a separate set of physical hardware, so us-central1-a can be exhausted of a given type while us-central1-b has plenty.

How do I avoid this for GPU workloads? Use capacity reservations for predictable jobs and build zone-failover logic for ad hoc ones. GPUs are among the most frequently exhausted resources, so single-zone GPU deployments are fragile by design.

Does using a regional MIG eliminate the error? It greatly reduces it because the autoscaler distributes instances across all zones and skips an exhausted one. It is not an absolute guarantee if every zone in the region is constrained, but it is the single most effective mitigation.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.