GCP Error Guide: 'RESOURCE_EXHAUSTED' Quota Exceeded (CPUS /

Overview

A RESOURCE_EXHAUSTED error means the project has hit a Google Cloud quota limit. Every project has per-region and per-service quotas (allocation quotas like CPUS or IN_USE_ADDRESSES, and rate quotas like API requests/minute). When a request would push usage past the limit, the API rejects it with HTTP 429 and a RESOURCE_EXHAUSTED status rather than provisioning the resource.

You will see this from gcloud or the client libraries:

ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.

Or the address-allocation variant:

ERROR: (gcloud.compute.addresses.create) Could not fetch resource:
 - Quota 'IN_USE_ADDRESSES' exceeded. Limit: 8.0 in region us-central1.

It occurs on creation/scale-up calls (instances, addresses, disks, GKE node pools, managed instance group scale-out) and on high-throughput API usage. Quotas are scoped per project and usually per region, so the same call can succeed in another region or project.

Symptoms

Create/scale calls fail with Quota '<METRIC>' exceeded. Limit: N in region <region>.
Managed instance groups and GKE autoscalers stall, logging quota errors instead of adding nodes.
Client libraries raise ResourceExhausted: 429.
The same operation succeeds in a different region or a different project.

gcloud compute instances create web-09 --zone us-central1-a --machine-type e2-standard-4

ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.

Common Root Causes

1. Regional CPU quota is fully consumed

Each region caps total vCPUs. New instances (or a larger machine type) push the sum past the limit.

gcloud compute regions describe us-central1 \
  --format="table(quotas.metric, quotas.usage, quotas.limit)" \
  | grep -E 'CPUS|N2_CPUS|E2_CPUS'

CPUS        22.0  24.0
N2_CPUS     0.0   24.0

Usage is 22 of 24; an e2-standard-4 (4 vCPUs) needs 26 total and is rejected.

2. IN_USE_ADDRESSES limit reached

Static and ephemeral external IPs both count against IN_USE_ADDRESSES per region. Reserving a new external IP fails once the quota is full.

gcloud compute regions describe us-central1 \
  --format="table(quotas.metric, quotas.usage, quotas.limit)" \
  | grep IN_USE_ADDRESSES

IN_USE_ADDRESSES  8.0  8.0

Usage equals the limit — no further external addresses can be allocated in us-central1.

3. Orphaned / unattached resources still counting

Reserved-but-unused static IPs, stopped instances, or stranded resources continue to consume allocation quota.

gcloud compute addresses list --filter="status=RESERVED AND region:us-central1" \
  --format="table(name, address, status, users.scope())"

NAME           ADDRESS        STATUS    USERS
legacy-ip-1    34.x.x.x       RESERVED
legacy-ip-2    34.x.x.x       RESERVED

RESERVED addresses with no USERS are unattached but still counted; releasing them frees quota immediately.

4. A specialized machine-family quota, not the global one

Some families have their own metric (N2_CPUS, C2_CPUS, GPU quotas). The global CPUS may have headroom while the family-specific quota is exhausted.

gcloud compute regions describe us-central1 \
  --format="value(quotas)" | tr ',' '\n' | grep -A2 C2_CPUS

'metric': 'C2_CPUS'
'limit': 0.0
'usage': 0.0

A C2_CPUS limit of 0.0 means c2 instances are blocked regardless of the global CPU pool.

5. Rate quota (requests/minute) exceeded

High-throughput callers can exhaust a per-minute API rate quota even with allocation quota to spare. These appear as transient 429s.

gcloud logging read \
  'protoPayload.status.code=8 AND protoPayload.serviceName="compute.googleapis.com"' \
  --project my-prod-project --limit 1 --format="value(protoPayload.status.message)"

Quota exceeded for quota metric 'Queries' and limit 'Queries per minute' of service compute.googleapis.com

A “per minute” message indicates a rate quota — back off and retry rather than requesting a permanent increase.

6. The limit really is too low for the workload

Default quotas (especially for new projects, GPUs, or large regions) are conservative and must be raised for production scale.

gcloud compute project-info describe --project my-prod-project \
  --format="value(quotas)" | tr ',' '\n' | grep -A2 'SSD_TOTAL_GB'

'metric': 'SSD_TOTAL_GB'
'limit': 500.0

A 500 GB SSD limit is fine for dev but undersized for a fleet that needs terabytes — request an increase.

Diagnostic Workflow

Step 1: Read the error for the exact metric, limit, and region

The message names all three, e.g. Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1. Everything else keys off the metric and region.

Step 2: Confirm current usage vs. limit for that metric

gcloud compute regions describe us-central1 \
  --format="table(quotas.metric, quotas.usage, quotas.limit)" \
  | grep -E '<METRIC>'

If usage equals limit, it is an allocation quota you must free or raise. If the message said “per minute,” it is a rate quota.

Step 3: Find what is consuming the quota

# CPUs: list running instances in the region
gcloud compute instances list --filter="zone~us-central1" \
  --format="table(name, machineType.basename(), status, zone.basename())"

# Addresses: list reserved/in-use IPs
gcloud compute addresses list --filter="region:us-central1" \
  --format="table(name, status, users.scope())"

Step 4: Reclaim wasted quota first

# Release an unattached static IP
gcloud compute addresses delete legacy-ip-1 --region us-central1

# Delete stopped/unneeded instances or downsize machine types
gcloud compute instances delete old-worker --zone us-central1-a

This often resolves the error without waiting on a quota request.

Step 5: If genuinely under-provisioned, request an increase

# List the quota and its current limit
gcloud alpha services quota list \
  --service=compute.googleapis.com \
  --consumer=projects/my-prod-project \
  --filter="metric:compute.googleapis.com/cpus" \
  --format="table(metric, quota.limit, dimensions)"

Then raise it via the Console (IAM & Admin → Quotas) or the Quotas API, selecting the region and the higher value. Increases are reviewed and may take minutes to days.

Example Root Cause Analysis

A GKE cluster’s autoscaler stops adding nodes during a traffic spike. The node pool events show:

ERROR: Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.

Checking regional usage:

gcloud compute regions describe us-central1 \
  --format="table(quotas.metric, quotas.usage, quotas.limit)" | grep -E '^CPUS'

CPUS  24.0  24.0

The region is maxed at 24 vCPUs. Listing instances reveals two stopped n2-standard-8 instances left over from a migration that still count against quota:

gcloud compute instances list --filter="zone~us-central1 AND status=TERMINATED" \
  --format="table(name, machineType.basename(), status)"

NAME           MACHINE_TYPE   STATUS
migrate-old-1  n2-standard-8  TERMINATED
migrate-old-2  n2-standard-8  TERMINATED

Those two reclaim 16 vCPUs. Deleting them frees quota immediately:

gcloud compute instances delete migrate-old-1 migrate-old-2 --zone us-central1-a

The autoscaler resumes adding nodes. A quota increase to 48 vCPUs is also filed so future spikes have headroom.

Prevention Best Practices

Monitor quota usage proactively: alert when any allocation metric exceeds ~80% of its limit so you raise it before a launch fails.
Run quota increase requests ahead of planned scale events; approvals are not instant and a max-region quota request can take days.
Sweep for orphaned static IPs and stopped instances on a schedule — they silently hold quota and are the most common avoidable cause.
Distribute workloads across regions where it makes sense so a single region’s quota is not a hard ceiling.
Track machine-family-specific quotas (N2_CPUS, GPU types) separately; the global CPUS number can hide a zeroed-out family limit.
For a quick read on which quota is biting in your logs, the free incident assistant can pull the metric and region out of the error. More walkthroughs are in the GCP guides.

Quick Command Reference

# Read the metric/limit/region from the error message first

# Regional allocation quotas (usage vs limit)
gcloud compute regions describe <REGION> \
  --format="table(quotas.metric, quotas.usage, quotas.limit)"

# Project-wide quotas
gcloud compute project-info describe --format="value(quotas)"

# What is using CPUs / addresses in the region
gcloud compute instances list --filter="zone~<REGION>"
gcloud compute addresses list --filter="region:<REGION>"

# Reclaim quota
gcloud compute addresses delete <NAME> --region <REGION>
gcloud compute instances delete <NAME> --zone <ZONE>

# Inspect a quota via the Service Usage API
gcloud alpha services quota list \
  --service=compute.googleapis.com --consumer=projects/<PROJECT> \
  --filter="metric:compute.googleapis.com/cpus"

# Rate-quota (429 per minute) errors in logs
gcloud logging read 'protoPayload.status.code=8' --project <PROJECT> --limit 5

Conclusion

A RESOURCE_EXHAUSTED / quota-exceeded error means a request would exceed a project quota for the metric and region named in the message. The usual root causes:

Regional CPU quota (CPUS) is fully consumed by running instances.
The IN_USE_ADDRESSES limit is reached for external IPs in the region.
Orphaned reserved IPs or stopped instances still count against allocation quota.
A machine-family-specific quota (N2_CPUS, C2_CPUS, GPUs) is exhausted while the global pool has room.
A per-minute rate quota is exceeded, calling for backoff rather than an increase.
The default limit is genuinely too low and needs a quota increase.

Read the exact metric and region from the error, reclaim wasted allocation first, and only file an increase once you have confirmed real usage justifies it.

GCP Error Guide: 'RESOURCE_EXHAUSTED' Quota Exceeded (CPUS / IN_USE_ADDRESSES)