GCP Error Guide: 'Quota CPUS exceeded. Limit: 24.0 in region us-central1' Regional vCPU Limit
Fix the Compute Engine CPUS quota exceeded error: find which instances consume regional vCPUs, request an increase, and prevent capacity stalls in us-central1.
- #gcp
- #troubleshooting
- #errors
- #quota
Exact Error Message
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.
Request ID: 0xb3f2a91c7d4e
Operation type: insert
HTTP status: 403 (PERMISSION_DENIED / quotaExceeded)
You may also see this surfaced from Terraform, a Managed Instance Group autoscaler event, or a GKE node-pool scale-up that silently fails to add nodes.
What the Error Means
Compute Engine enforces a per-region allocation quota named CPUS. It caps the total number of vCPUs across all running instances in a single region, regardless of machine family. The default for many regions, including us-central1, is 24 vCPUs for newer projects.
The error is raised when the vCPU count of the instance you are creating, added to the vCPUs already in use in that region, would exceed the limit. For example, if you already have 22 vCPUs running and try to launch an n2-standard-4 (4 vCPUs), the request lands at 26 and is rejected with the 403 above.
Two important details:
- The quota is regional, not zonal. Spreading instances across
us-central1-a,-b, and-cdoes not help; they all draw from the same regional pool. - It is an allocation quota (capacity reservation), not a rate quota. Retrying with backoff will not clear it. You must either free vCPUs or raise the limit.
Common Causes
- Genuine growth. More workloads have been deployed and the cumulative vCPU count crossed the default 24.
- Large machine types. A single
n2-standard-32consumes 32 vCPUs and exceeds the default on its own. - Autoscalers scaling up. A GKE node pool or a Managed Instance Group tries to add nodes and hits the ceiling. The instances never appear, and the cluster looks “stuck” pending.
- Orphaned or stopped-but-not-deleted instances. Instances in
TERMINATEDstate do not consume CPUS quota, but instances leftRUNNINGafter a failed teardown do. - Wrong region default. Newer or low-trust projects start with low quotas (sometimes 8 or 12 vCPUs) that are easy to exhaust.
- Preemptible/Spot plus on-demand. Both standard and preemptible vCPUs count toward
CPUS(preemptible also has a separatePREEMPTIBLE_CPUSquota in some regions).
How to Reproduce the Error
On a fresh project with the default 24 vCPU limit in us-central1:
# Launch instances until the regional pool is near the cap (5 x n2-standard-4 = 20 vCPUs)
for i in 1 2 3 4 5; do
gcloud compute instances create filler-$i \
--machine-type=n2-standard-4 \
--zone=us-central1-a \
--project=acme-prod-platform
done
# This sixth instance would push usage to 24 + 4 = 28 and fails:
gcloud compute instances create overflow-vm \
--machine-type=n2-standard-4 \
--zone=us-central1-b \
--project=acme-prod-platform
The sixth create returns Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.
Diagnostic Commands
All of these are read-only. Run them to find the current limit, current usage, and which instances are consuming vCPUs.
# Show the CPUS quota limit and live usage for the region
gcloud compute regions describe us-central1 \
--project=acme-prod-platform \
--format="table(quotas.metric, quotas.limit, quotas.usage)" \
| grep -i cpus
# List all running instances in the region with their machine types
gcloud compute instances list \
--project=acme-prod-platform \
--filter="zone~us-central1 AND status=RUNNING" \
--format="table(name, zone, machineType.basename(), status)"
# Inspect a specific instance to confirm its vCPU count via machine type
gcloud compute instances describe filler-1 \
--zone=us-central1-a \
--project=acme-prod-platform \
--format="value(machineType.basename())"
# Cross-reference how many vCPUs a machine type provides
gcloud compute machine-types describe n2-standard-4 \
--zone=us-central1-a \
--project=acme-prod-platform \
--format="value(guestCpus)"
# Confirm the active project, in case you are hitting the wrong one
gcloud config get-value project
Step-by-Step Resolution
You have two paths: free vCPUs immediately, or raise the quota. Use both.
1. Reclaim vCPUs you do not need. Delete or stop instances that are idle. Stopping (not just deleting) releases the quota once the instance reaches TERMINATED.
gcloud compute instances delete filler-5 \
--zone=us-central1-a \
--project=acme-prod-platform --quiet
2. Right-size oversized machines. Stop the instance, change to a smaller type, and restart.
gcloud compute instances stop big-vm --zone=us-central1-a --project=acme-prod-platform
gcloud compute instances set-machine-type big-vm \
--machine-type=n2-standard-8 --zone=us-central1-a --project=acme-prod-platform
gcloud compute instances start big-vm --zone=us-central1-a --project=acme-prod-platform
3. Request a quota increase. Use the Quotas page (IAM & Admin -> Quotas) or the CLI to raise the regional CPUS limit. Increases under a few hundred vCPUs are often auto-approved within minutes.
# Submit an increase to 64 vCPUs for us-central1
gcloud quotas preferences create \
--service=compute.googleapis.com \
--quota-id=CPUS-per-project-region \
--dimensions=region=us-central1 \
--preferred-value=64 \
--project=acme-prod-platform
4. Spread across regions. If one region is saturated and an increase is pending, deploy net-new workloads to us-east1 or us-west1, which have independent quota pools.
5. Re-run the failed operation. Once usage is under the limit (or the increase is granted), retry the create, the Terraform apply, or trigger the autoscaler again.
Prevention and Best Practices
- Pre-flight quota checks in CI. Before a Terraform apply, query
gcloud compute regions describeand fail the pipeline if projected usage exceeds the limit. - Request headroom early. Raise
CPUSto 2-3x your steady-state need so autoscalers never stall during traffic spikes. - Alert on quota usage. Cloud Monitoring exposes
serviceruntime.googleapis.com/quota/allocation/usage. Alert at 80% of theCPUSlimit per region. - Tag and reap orphans. Label short-lived instances and run a scheduled job to delete anything past its TTL so quota is not silently consumed.
- Treat autoscaler failures as quota signals. A MIG or GKE node pool stuck “scaling” with no new instances almost always means a quota wall. Route these to your on-call workflow via the Incident Response dashboard.
Related Errors
RESOURCE_EXHAUSTED/ quota exceeded forIN_USE_ADDRESSES— out of external IPs in the region, not vCPUs.ZONE_RESOURCE_POOL_EXHAUSTED— quota is fine, but the zone physically lacks capacity for the machine type.Quota 'N2_CPUS' exceeded— per-family vCPU quota, separate from the aggregateCPUSquota.Quota 'PREEMPTIBLE_CPUS' exceeded— Spot/preemptible vCPU ceiling in the region.- See more in the GCP error guides.
Frequently Asked Questions
Does the CPUS quota count stopped instances?
No. Instances in TERMINATED state do not consume CPUS quota. Only RUNNING (and PROVISIONING/STAGING) instances count. If you stopped a VM and the quota did not drop, confirm it actually reached TERMINATED.
Why does spreading instances across zones not help?
Because CPUS is a regional quota. All zones in us-central1 draw from the same vCPU pool, so moving instances between -a, -b, and -c has no effect on the limit.
How long does a quota increase take to approve? Small increases (up to a few hundred vCPUs) are frequently auto-approved within minutes. Larger requests are reviewed by Google and can take a business day or two, so request headroom before you need it.
Do preemptible and Spot VMs count toward CPUS?
Yes, they count toward the aggregate CPUS quota. In regions that also expose PREEMPTIBLE_CPUS, Spot instances are checked against both, so you can be blocked by either limit.
My Terraform apply failed midway with this error. What state am I in?
Terraform created the instances it could before hitting the wall, then errored. Fix the quota and re-run terraform apply; it is idempotent and will create only the remaining resources.
Is there a single quota for all machine families?
The aggregate CPUS quota covers all families, but several families (N2_CPUS, C2_CPUS, etc.) also have their own per-family quotas. You must stay under both the aggregate and the family-specific limit.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.