GCP Error Guide: 'Quota CPUS exceeded. Limit: 24.0 in region

Exact Error Message

ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.

Request ID: 0xb3f2a91c7d4e
Operation type: insert
HTTP status: 403 (PERMISSION_DENIED / quotaExceeded)

You may also see this surfaced from Terraform, a Managed Instance Group autoscaler event, or a GKE node-pool scale-up that silently fails to add nodes.

What the Error Means

Compute Engine enforces a per-region allocation quota named CPUS. It caps the total number of vCPUs across all running instances in a single region, regardless of machine family. The default for many regions, including us-central1, is 24 vCPUs for newer projects.

The error is raised when the vCPU count of the instance you are creating, added to the vCPUs already in use in that region, would exceed the limit. For example, if you already have 22 vCPUs running and try to launch an n2-standard-4 (4 vCPUs), the request lands at 26 and is rejected with the 403 above.

Two important details:

The quota is regional, not zonal. Spreading instances across us-central1-a, -b, and -c does not help; they all draw from the same regional pool.
It is an allocation quota (capacity reservation), not a rate quota. Retrying with backoff will not clear it. You must either free vCPUs or raise the limit.

Common Causes

Genuine growth. More workloads have been deployed and the cumulative vCPU count crossed the default 24.
Large machine types. A single n2-standard-32 consumes 32 vCPUs and exceeds the default on its own.
Autoscalers scaling up. A GKE node pool or a Managed Instance Group tries to add nodes and hits the ceiling. The instances never appear, and the cluster looks “stuck” pending.
Orphaned or stopped-but-not-deleted instances. Instances in TERMINATED state do not consume CPUS quota, but instances left RUNNING after a failed teardown do.
Wrong region default. Newer or low-trust projects start with low quotas (sometimes 8 or 12 vCPUs) that are easy to exhaust.
Preemptible/Spot plus on-demand. Both standard and preemptible vCPUs count toward CPUS (preemptible also has a separate PREEMPTIBLE_CPUS quota in some regions).

How to Reproduce the Error

On a fresh project with the default 24 vCPU limit in us-central1:

# Launch instances until the regional pool is near the cap (5 x n2-standard-4 = 20 vCPUs)
for i in 1 2 3 4 5; do
  gcloud compute instances create filler-$i \
    --machine-type=n2-standard-4 \
    --zone=us-central1-a \
    --project=acme-prod-platform
done

# This sixth instance would push usage to 24 + 4 = 28 and fails:
gcloud compute instances create overflow-vm \
  --machine-type=n2-standard-4 \
  --zone=us-central1-b \
  --project=acme-prod-platform

The sixth create returns Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.

Diagnostic Commands

All of these are read-only. Run them to find the current limit, current usage, and which instances are consuming vCPUs.

# Show the CPUS quota limit and live usage for the region
gcloud compute regions describe us-central1 \
  --project=acme-prod-platform \
  --format="table(quotas.metric, quotas.limit, quotas.usage)" \
  | grep -i cpus

# List all running instances in the region with their machine types
gcloud compute instances list \
  --project=acme-prod-platform \
  --filter="zone~us-central1 AND status=RUNNING" \
  --format="table(name, zone, machineType.basename(), status)"

# Inspect a specific instance to confirm its vCPU count via machine type
gcloud compute instances describe filler-1 \
  --zone=us-central1-a \
  --project=acme-prod-platform \
  --format="value(machineType.basename())"

# Cross-reference how many vCPUs a machine type provides
gcloud compute machine-types describe n2-standard-4 \
  --zone=us-central1-a \
  --project=acme-prod-platform \
  --format="value(guestCpus)"

# Confirm the active project, in case you are hitting the wrong one
gcloud config get-value project

Step-by-Step Resolution

You have two paths: free vCPUs immediately, or raise the quota. Use both.

1. Reclaim vCPUs you do not need. Delete or stop instances that are idle. Stopping (not just deleting) releases the quota once the instance reaches TERMINATED.

gcloud compute instances delete filler-5 \
  --zone=us-central1-a \
  --project=acme-prod-platform --quiet

2. Right-size oversized machines. Stop the instance, change to a smaller type, and restart.

gcloud compute instances stop big-vm --zone=us-central1-a --project=acme-prod-platform
gcloud compute instances set-machine-type big-vm \
  --machine-type=n2-standard-8 --zone=us-central1-a --project=acme-prod-platform
gcloud compute instances start big-vm --zone=us-central1-a --project=acme-prod-platform

3. Request a quota increase. Use the Quotas page (IAM & Admin -> Quotas) or the CLI to raise the regional CPUS limit. Increases under a few hundred vCPUs are often auto-approved within minutes.

# Submit an increase to 64 vCPUs for us-central1
gcloud quotas preferences create \
  --service=compute.googleapis.com \
  --quota-id=CPUS-per-project-region \
  --dimensions=region=us-central1 \
  --preferred-value=64 \
  --project=acme-prod-platform

4. Spread across regions. If one region is saturated and an increase is pending, deploy net-new workloads to us-east1 or us-west1, which have independent quota pools.

5. Re-run the failed operation. Once usage is under the limit (or the increase is granted), retry the create, the Terraform apply, or trigger the autoscaler again.

Prevention and Best Practices

Pre-flight quota checks in CI. Before a Terraform apply, query gcloud compute regions describe and fail the pipeline if projected usage exceeds the limit.
Request headroom early. Raise CPUS to 2-3x your steady-state need so autoscalers never stall during traffic spikes.
Alert on quota usage. Cloud Monitoring exposes serviceruntime.googleapis.com/quota/allocation/usage. Alert at 80% of the CPUS limit per region.
Tag and reap orphans. Label short-lived instances and run a scheduled job to delete anything past its TTL so quota is not silently consumed.
Treat autoscaler failures as quota signals. A MIG or GKE node pool stuck “scaling” with no new instances almost always means a quota wall. Route these to your on-call workflow via the Incident Response dashboard.

RESOURCE_EXHAUSTED / quota exceeded for IN_USE_ADDRESSES — out of external IPs in the region, not vCPUs.
ZONE_RESOURCE_POOL_EXHAUSTED — quota is fine, but the zone physically lacks capacity for the machine type.
Quota 'N2_CPUS' exceeded — per-family vCPU quota, separate from the aggregate CPUS quota.
Quota 'PREEMPTIBLE_CPUS' exceeded — Spot/preemptible vCPU ceiling in the region.
See more in the GCP error guides.

Frequently Asked Questions

Does the CPUS quota count stopped instances? No. Instances in TERMINATED state do not consume CPUS quota. Only RUNNING (and PROVISIONING/STAGING) instances count. If you stopped a VM and the quota did not drop, confirm it actually reached TERMINATED.

Why does spreading instances across zones not help? Because CPUS is a regional quota. All zones in us-central1 draw from the same vCPU pool, so moving instances between -a, -b, and -c has no effect on the limit.

How long does a quota increase take to approve? Small increases (up to a few hundred vCPUs) are frequently auto-approved within minutes. Larger requests are reviewed by Google and can take a business day or two, so request headroom before you need it.

Do preemptible and Spot VMs count toward CPUS? Yes, they count toward the aggregate CPUS quota. In regions that also expose PREEMPTIBLE_CPUS, Spot instances are checked against both, so you can be blocked by either limit.

My Terraform apply failed midway with this error. What state am I in? Terraform created the instances it could before hitting the wall, then errored. Fix the quota and re-run terraform apply; it is idempotent and will create only the remaining resources.

Is there a single quota for all machine families? The aggregate CPUS quota covers all families, but several families (N2_CPUS, C2_CPUS, etc.) also have their own per-family quotas. You must stay under both the aggregate and the family-specific limit.

GCP Error Guide: 'Quota CPUS exceeded. Limit: 24.0 in region us-central1' Regional vCPU Limit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit