Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Terraform By James Joyner IV · · 9 min read

Terraform Error Guide: 'timeout while waiting for state to become available'

Fix Terraform apply timeouts: raise the timeouts block, check the cloud console for the real status, handle throttling, and treat the underlying failure not the symptom.

  • #terraform
  • #troubleshooting
  • #errors
  • #providers

Exact Error Message

This family of errors appears during terraform apply, when a resource is created or updated but the provider’s wait loop gives up before the cloud reports the resource ready. The wording varies by provider and resource:

Error: timeout while waiting for state to become 'available'
(last state: 'creating', timeout: 40m0s)

  with aws_db_instance.main,
  on rds.tf line 1, in resource "aws_db_instance" "main":
   1: resource "aws_db_instance" "main" {

You will also encounter these closely related variants:

Error: error creating EKS Cluster (prod): context deadline exceeded

Error: timeout while waiting for resource to be created

Error: waiting for NAT Gateway (nat-0abc) create: timeout while waiting for
state to become 'available' (last state: 'pending', timeout: 10m0s)

The key fields are the last state (creating, pending, modifying) and the timeout value — together they tell you whether the resource was still making progress or genuinely stuck.

What the Error Means

Many cloud resources are asynchronous: the create API returns immediately, then the provider polls the cloud until the resource reaches a target state such as available or ACTIVE. Each resource type has a timeout — a default baked into the provider, or one you set in a timeouts block. When the wait loop exceeds that timeout before reaching the target state, Terraform aborts with timeout while waiting... or, at the HTTP layer, context deadline exceeded.

Crucially, a timeout does not mean the resource failed to create. Terraform stopped watching; the cloud may still be working. The resource is now in Terraform state as tainted or partially tracked, while the real object continues provisioning (or is stuck on a real cloud-side error). That is why the fix is rarely “just raise the number” — you first have to learn what the cloud is actually doing.

Common Causes

  • The resource is genuinely slow. RDS instances, EKS clusters, NAT gateways, CloudFront distributions, and Aurora clusters routinely take 15-60 minutes; the default timeout is simply too short for your configuration.
  • A timeouts block is too short. A custom create/update/delete value, or the provider default, is below the real provisioning time.
  • The resource is stuck in pending from a real cloud-side error. Insufficient capacity, a bad parameter group, an unavailable AZ, or a quota limit leaves it never reaching available.
  • Rate limiting / throttling. The provider’s poll calls get ThrottlingException/429, slowing the loop until it exceeds the deadline.
  • A dependency is not ready. The resource waits on a subnet, IAM role, or security group that itself is delayed, so it never progresses.
  • Network to the cloud API is dropping. Flaky egress, VPN, or proxy issues cause the provider to lose poll responses and time out.

How to Reproduce the Error

Set an artificially short timeout on a resource that genuinely takes longer:

resource "aws_db_instance" "main" {
  identifier        = "demo"
  engine            = "postgres"
  instance_class    = "db.t3.medium"
  allocated_storage = 100
  username          = "admin"
  password          = var.db_password
  skip_final_snapshot = true

  timeouts {
    create = "5m"   # far shorter than RDS provisioning
  }
}
terraform apply -auto-approve
Error: timeout while waiting for state to become 'available'
(last state: 'creating', timeout: 5m0s)

The instance keeps creating in AWS even though Terraform has already given up.

Diagnostic Commands

Before changing any timeout, find out what the cloud actually reports:

# What is the real status in AWS, right now?
aws rds describe-db-instances --db-instance-identifier demo \
  --query 'DBInstances[0].DBInstanceStatus'

# EKS cluster status and any failure reason
aws eks describe-cluster --name prod --query 'cluster.status'

# Surface provider-level retries/throttling in the apply
TF_LOG=DEBUG terraform apply 2>&1 | grep -iE 'throttl|429|retry|deadline'

# See what Terraform currently tracks for the resource
terraform state show aws_db_instance.main

If the console shows available, Terraform just under-waited. If it shows failed, incompatible-parameters, or stays pending, you have a real cloud-side problem to fix first.

Step-by-Step Resolution

1. Check the cloud console/CLI for the real status (above). Decide whether this is “slow but fine” or “actually broken.” Never raise a timeout to mask a genuine failure.

2. If the resource is healthy but slow, raise the timeouts block to a realistic value for that resource type:

resource "aws_db_instance" "main" {
  # ...
  timeouts {
    create = "60m"
    update = "80m"
    delete = "60m"
  }
}

For EKS or NAT gateways, similar generous windows apply:

resource "aws_eks_cluster" "prod" {
  # ...
  timeouts {
    create = "30m"
    delete = "30m"
  }
}

3. Re-sync state and continue rather than recreating. If the resource finished after the timeout, refresh and re-apply so Terraform reconciles:

terraform apply -refresh-only
terraform apply

4. If throttling is the cause, raise provider retry limits so poll calls survive 429s:

provider "aws" {
  region          = "us-east-1"
  max_retries     = 25
  retry_mode      = "adaptive"
}

5. If the resource is genuinely stuck, fix the underlying cause — capacity, quota, parameter group, AZ — in the cloud, then taint and re-create:

terraform taint aws_db_instance.main
terraform apply

6. For HTTP-level context deadline exceeded, check network/proxy to the cloud API and, where supported, increase the provider’s request timeout via environment configuration before retrying.

Prevention and Best Practices

  • Set explicit timeouts blocks on known-slow resources (RDS, Aurora, EKS, NAT gateways, CloudFront) instead of relying on provider defaults.
  • Treat a timeout as a signal to inspect the cloud, not an automatic “increase and retry” — raising the limit on a genuinely failing resource just wastes 60 minutes per run.
  • Enable retry_mode = "adaptive" and a higher max_retries in busy accounts to ride out API throttling.
  • Order dependencies explicitly with depends_on so a resource never starts waiting before its prerequisites exist.
  • Split very large applies (or use -target for the slow resource) so one long-provisioning resource does not put the whole run at risk.
  • Run terraform apply -refresh-only after a timeout to let Terraform pick up resources that finished provisioning out of band. The free incident assistant can correlate the last state field with the real cloud status to tell “slow” from “stuck.” More patterns in the Terraform guides.
  • context deadline exceeded — the lower-level Go HTTP timeout that underlies many of these waits; same root causes, raised at the API client layer.
  • Error: error waiting for ... deletion — the delete-side analogue; bump timeouts { delete = ... } and check for dependencies blocking teardown.
  • ThrottlingException / RequestLimitExceeded — the throttling that often causes the timeout; address it with provider retries.
  • InsufficientInstanceCapacity / quota exceeded — the genuine cloud-side failures that leave a resource stuck in pending until the timeout fires.

Frequently Asked Questions

Does a timeout mean Terraform failed to create the resource? No. It means Terraform stopped waiting. The cloud often continues provisioning and the resource becomes available minutes later. Check the console; if it is healthy, run terraform apply -refresh-only and re-apply rather than recreating it.

Should I just keep increasing the timeout until it passes? Only if the resource is genuinely slow but progressing. If it is stuck in pending/failed due to a real error (quota, capacity, bad parameter group), a larger timeout only delays the same failure. Fix the cloud-side cause first.

Where do I set the timeout — provider or resource? Per-resource, in a timeouts { create/update/delete = "..." } block, using Go duration strings like "60m". Provider-level settings (max_retries, retry_mode) tune retry behavior for throttling, not the wait window.

My apply fails with context deadline exceeded instead of a state message. Is it the same thing? Usually yes — it is the HTTP client’s deadline rather than the resource wait loop. Check network/proxy reliability to the cloud API and provider retry settings, then re-run. Persistent deadlines often indicate dropped connectivity rather than a slow resource.

Can dependencies cause spurious timeouts? Yes. If a resource waits on a subnet, IAM role, or security group that is itself delayed, it can exhaust its own timeout before its prerequisite is ready. Add explicit depends_on and confirm the dependency reaches its ready state first.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.