GCP Error Guide: 'IP_SPACE_EXHAUSTED' GKE Secondary Range Out of IPs
Fix GKE IP_SPACE_EXHAUSTED: diagnose why a pod or service secondary range ran out of free IPs, add ranges, and size VPC-native clusters so scaling never stalls.
- #gcp
- #troubleshooting
- #errors
- #gke
Exact Error Message
ERROR: googleapi: Error 400: IP_SPACE_EXHAUSTED: Instance
'gke-prod-cluster-default-pool-7a1c-9f2b' creation failed:
IP space of 'projects/acme-prod-platform/regions/us-central1/subnetworks/gke-subnet'
is exhausted. The secondary range 'pods' does not have enough free IP addresses
to allocate a /24 block to the node.
reason: IP_SPACE_EXHAUSTED
In the GKE console this surfaces as a node pool stuck in PROVISIONING, a scale-up that never completes, or pods stuck Pending with FailedScheduling and 0/N nodes available: insufficient pods.
What the Error Means
GKE VPC-native clusters use alias IP ranges. Each node draws pod IPs from a secondary range on the subnet (commonly named pods), and Services draw from another secondary range (commonly services). When a new node is created, GKE carves a CIDR block out of the pod secondary range (a /24 per node by default, i.e. 256 addresses for up to 110 pods).
IP_SPACE_EXHAUSTED means the relevant secondary range has no free address blocks left to satisfy the request. The cluster cannot place the node (or, for the services range, cannot allocate a ClusterIP), so scaling halts.
This is an addressing problem, not a quota or capacity problem. The fix is always about IP range sizing, not vCPUs or zones.
Common Causes
- Undersized pod secondary range. A
/20pod range (4,096 IPs) only supports 16 nodes at a/24per node. Scaling past that exhausts it. - Default per-node
/24is wasteful. Default--default-max-pods-per-node=110reserves a/24(256 IPs) per node even if you run far fewer pods, burning the range quickly. - Many small node pools. Each node still consumes a full
/24block, so lots of nodes across pools drain the shared pod range. - Services range too small. A tiny
servicessecondary range exhausts when many ClusterIP Services are created. - Shared VPC / overlapping CIDRs. In Shared VPC setups the secondary ranges are pre-carved by the network admin and may be smaller than the cluster needs.
- Autoscaler hitting the wall. Cluster Autoscaler tries to add nodes for
Pendingpods, but every attempt fails on IP exhaustion, leaving pods stuck indefinitely.
How to Reproduce the Error
Create a VPC-native cluster on a deliberately small pod range, then scale past what it can hold.
# Subnet with a small /24 pod secondary range = room for only ~1 node at /24 per node
gcloud compute networks subnets create gke-subnet \
--network=prod-vpc --region=us-central1 \
--range=10.0.0.0/24 \
--secondary-range=pods=10.4.0.0/24,services=10.5.0.0/24 \
--project=acme-prod-platform
gcloud container clusters create prod-cluster \
--region=us-central1 --enable-ip-alias \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--num-nodes=1 --project=acme-prod-platform
# Scaling to multiple nodes exhausts the /24 pod range:
gcloud container clusters resize prod-cluster \
--node-pool=default-pool --num-nodes=4 \
--region=us-central1 --project=acme-prod-platform
The additional nodes fail with IP_SPACE_EXHAUSTED.
Diagnostic Commands
All read-only. Identify the secondary ranges, their sizes, and how the cluster consumes them.
# Show the cluster's IP allocation policy: which secondary ranges it uses + per-node pod count
gcloud container clusters describe prod-cluster \
--region=us-central1 --project=acme-prod-platform \
--format="yaml(ipAllocationPolicy)"
# Inspect the subnet's secondary ranges and their CIDR sizes
gcloud compute networks subnets describe gke-subnet \
--region=us-central1 --project=acme-prod-platform \
--format="yaml(ipCidrRange, secondaryIpRanges)"
# Use the IP-usage view to see free vs used in the pod range (CONFIG_PLUS_USAGE)
gcloud compute networks subnets describe gke-subnet \
--region=us-central1 --project=acme-prod-platform \
--format="default(secondaryIpRanges)" \
--verbosity=info
# How many nodes currently exist (each consumes a /24 by default)
gcloud container node-pools list \
--cluster=prod-cluster --region=us-central1 \
--project=acme-prod-platform \
--format="table(name, initialNodeCount, config.machineType)"
# Confirm the active project
gcloud config get-value project
To see the actual per-node CIDR assignments from inside the cluster, the read-only Kubernetes view is:
kubectl get nodes -o custom-columns="NODE:.metadata.name,PODCIDR:.spec.podCIDR"
Step-by-Step Resolution
1. Add a new pod secondary range to the subnet. You cannot resize an existing secondary range in place, but you can add an additional one and have GKE use it.
gcloud compute networks subnets update gke-subnet \
--region=us-central1 \
--add-secondary-ranges=pods-2=10.8.0.0/16 \
--project=acme-prod-platform
2. Attach the new range to the cluster (multi-pod-CIDR). On supported clusters, register the additional pod range so new nodes draw from it.
gcloud container clusters update prod-cluster \
--region=us-central1 \
--additional-pod-ipv4-ranges=pods-2 \
--project=acme-prod-platform
3. Create a node pool with fewer pods per node. Lowering --max-pods-per-node shrinks each node’s reserved block (e.g. 32 pods -> /26 instead of /24), multiplying how many nodes fit.
gcloud container node-pools create dense-pool \
--cluster=prod-cluster --region=us-central1 \
--max-pods-per-node=32 --num-nodes=3 \
--project=acme-prod-platform
4. For a fresh start, size the range correctly. Required IPs = (max nodes) x (per-node CIDR size). For 256 nodes at 64 pods/node (/26), you need roughly a /18 pod range. Build new clusters with a generously sized pod range from day one.
5. Re-run the scale-up. Once a range with free space is attached, resize the node pool or let the autoscaler retry; the previously Pending pods schedule.
Prevention and Best Practices
- Right-size pod CIDR up front. Plan for peak node count and use the per-node pod count to compute the block size; a
/16pod range supports hundreds of nodes comfortably. - Lower max-pods-per-node. If you run ~30 pods per node, set
--max-pods-per-node=32so each node uses a/26, not a/24. This is the single biggest lever. - Monitor secondary-range utilization. Alert when the pod range exceeds ~75% allocation so you add a range before scale-ups fail.
- Plan Shared VPC ranges with the network team. In Shared VPC, secondary ranges are owned by the host project; agree on sizes that fit the cluster’s growth.
- Avoid range overlap. New secondary ranges must not overlap existing subnet, pod, or services CIDRs, or the update is rejected.
- Wire stuck scale-ups into on-call. A node pool stuck provisioning on IP exhaustion is a production risk; route it through your Incident Response dashboard.
Related Errors
Quota 'IN_USE_ADDRESSES' exceeded— out of external IP quota, unrelated to internal pod ranges.ZONE_RESOURCE_POOL_EXHAUSTED— zone is out of compute capacity, not IPs.Insufficient pods/FailedScheduling— the symptom pods show while the underlying cause isIP_SPACE_EXHAUSTED.Range overlaps with existing range— raised when an added secondary range collides with an existing CIDR.- More in the GCP error guides.
Frequently Asked Questions
Can I resize an existing secondary range to add more IPs? No, secondary ranges cannot be expanded in place. You must add an additional secondary range to the subnet and attach it to the cluster as an extra pod range. Plan ranges generously from the start to avoid this.
Why does each node consume a whole /24 when I only run a few pods?
GKE reserves a fixed CIDR block per node based on --max-pods-per-node (default 110, which rounds up to a /24). The block is reserved regardless of how many pods actually run. Lowering the max-pods setting shrinks the per-node block.
How do I calculate the pod range size I need?
Multiply your maximum node count by the per-node block size. At the default /24 per node, a /16 range (256 blocks) holds about 256 nodes. Reducing per-node pods to use a /26 lets the same /16 hold roughly 1,024 nodes.
Does this error affect the services range too?
Yes. If the services secondary range is exhausted, new ClusterIP Services fail to get an IP. It is less common than pod exhaustion because Services consume one IP each rather than a block per node.
My autoscaler keeps failing silently. Is this the cause?
Often, yes. If pods are Pending and the autoscaler logs IP_SPACE_EXHAUSTED, it is trying to add nodes but cannot allocate pod CIDRs. Add a pod range; the queued pods will then schedule.
Can I change max-pods-per-node on an existing node pool? No, it is fixed at node-pool creation. Create a new node pool with the lower value, migrate workloads to it, and delete the old pool.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.