Terraform Error Guide: 'Error refreshing state'

Overview

Before planning, Terraform refreshes state by reading every managed resource from the provider. If the credentials it uses are expired, missing, or lack permission, those read calls fail with 403/AccessDenied/ExpiredToken, and Terraform reports Error refreshing state. The state file itself is usually fine — the problem is authentication or authorization at the provider (or the state backend) layer.

You will see this on terraform plan or apply:

Error: error refreshing state: 1 error occurred:
	* aws_instance.app: error reading EC2 Instance (i-0a1b2c3d4e5f): operation
	error EC2: DescribeInstances, https response error StatusCode: 403,
	api error UnauthorizedOperation: You are not authorized to perform this
	operation.

Or from expired session credentials:

Error: error configuring Terraform AWS Provider: validating provider
credentials: retrieving caller identity: operation error STS:
GetCallerIdentity, https response error StatusCode: 403, api error
ExpiredToken: The security token included in the request is expired.

The fix is to restore valid credentials (renew the token, fix the profile/role/region), not to touch the state.

Symptoms

plan/apply fail with Error refreshing state and a 403/AccessDenied/ExpiredToken.
It worked an hour ago and now fails (short-lived token expired).
A specific resource or the whole provider fails to authenticate.
The state backend itself returns 403 (separate from the resource provider).

terraform plan

Error: error configuring Terraform AWS Provider: ... api error ExpiredToken:
The security token included in the request is expired.

Common Root Causes

1. Expired short-lived (STS / SSO) credentials

A session token from aws sso login/assume-role has a TTL; once it lapses every call returns ExpiredToken.

aws sts get-caller-identity

An error occurred (ExpiredToken) when calling the GetCallerIdentity operation:
The security token included in the request is expired.

Re-authenticate to refresh the token.

2. Wrong or missing AWS profile

Terraform picks up the default profile (or none) when the work needs a specific one.

echo "$AWS_PROFILE"
grep -rn 'profile' provider.tf

provider "aws" {
  profile = "prod"   # but AWS_PROFILE is unset / set to dev
}

A mismatch between the provider profile/env and the credentials you loaded yields 403.

3. A broken assume-role chain

The provider assumes a role you no longer have permission to assume, or the role’s trust policy changed.

aws sts assume-role --role-arn arn:aws:iam::123456789012:role/tf-deploy \
  --role-session-name test

An error occurred (AccessDenied) when calling the AssumeRole operation: User
is not authorized to perform: sts:AssumeRole on resource: ...

4. Region mismatch

The provider points at a region where the resource does not exist, so the read returns not-found/denied.

grep -rn 'region' provider.tf
echo "$AWS_REGION $AWS_DEFAULT_REGION"

provider "aws" { region = "us-west-2" }

If the resources live in us-east-1, refresh reads fail or return nothing.

5. The state backend itself returns 403

Reading/writing the remote state (S3/GCS/azurerm) fails because the backend credentials lack s3:GetObject/dynamodb access.

aws s3api head-object --bucket my-tf-state --key prod/terraform.tfstate

An error occurred (403) when calling the HeadObject operation: Forbidden

This is distinct from the resource provider — fix the backend’s IAM permissions.

6. Insufficient IAM permissions to describe resources

Credentials are valid but the policy lacks Describe*/Get* on the resource being refreshed.

aws ec2 describe-instances --instance-ids i-0a1b2c3d4e5f

An error occurred (UnauthorizedOperation) when calling the DescribeInstances
operation: You are not authorized to perform this operation.

Diagnostic Workflow

Step 1: Read whether it is auth (expired) or authz (denied)

terraform plan 2>&1 | grep -Ei 'ExpiredToken|AccessDenied|Unauthorized|403'

ExpiredToken -> renew credentials. AccessDenied/Unauthorized -> permissions/role/region.

Step 2: Verify your identity end to end

aws sts get-caller-identity

If this fails, fix credentials first — nothing else will work. If it succeeds, note the ARN actually in use.

Step 3: Renew short-lived credentials

aws sso login --profile prod      # SSO
# or refresh the assume-role session your tooling uses

Confirm the token is fresh:

aws sts get-caller-identity --query Arn -f text

Step 4: Confirm profile and region match the config

grep -rn 'profile\|region\|assume_role' provider.tf
env | grep -E 'AWS_PROFILE|AWS_REGION|AWS_DEFAULT_REGION'

Align the env/profile/region with what the provider block expects.

Step 5: Test backend access separately, then re-plan

aws s3api head-object --bucket <STATE_BUCKET> --key <STATE_KEY>
terraform plan

A successful state read plus a fresh identity should let the refresh complete.

Example Root Cause Analysis

A plan that ran fine in the morning now fails after lunch:

Error: error configuring Terraform AWS Provider: validating provider
credentials: retrieving caller identity: operation error STS:
GetCallerIdentity, https response error StatusCode: 403, api error
ExpiredToken: The security token included in the request is expired.

The provider cannot even validate credentials, so this is authentication, not permissions. Confirming directly:

aws sts get-caller-identity

An error occurred (ExpiredToken) when calling the GetCallerIdentity operation:
The security token included in the request is expired.

The engineer’s SSO session has a multi-hour TTL that lapsed since the morning. No config changed; the token simply aged out.

Fix: re-authenticate and re-run.

aws sso login --profile prod
aws sts get-caller-identity --query Arn -f text

arn:aws:sts::123456789012:assumed-role/prod-admin/jane

With a fresh token, terraform plan refreshes state and completes normally. The durable fix is to wrap long sessions in a credential-refresh helper so applies do not die mid-run.

Prevention Best Practices

Run aws sts get-caller-identity (or the provider equivalent) as a pre-flight check in CI and wrappers so an expired token fails fast and clearly.
Use credential helpers / SSO auto-refresh so long applies cannot expire mid-run and orphan a state lock.
Pin profile, region, and assume_role explicitly in the provider block so behavior does not depend on ambient env vars.
Grant the deploy role the Describe*/Get* read permissions Terraform needs to refresh, plus the backend’s state-bucket access — separately.
Separate backend credentials from resource credentials in your mental model; a backend 403 is a different fix than a provider 403.
For triage, the free incident assistant can classify the error as expired-token vs. denied-permission and suggest the renew/role/region fix. More patterns in the Terraform guides.

Quick Command Reference

# Auth vs. authz?
terraform plan 2>&1 | grep -Ei 'ExpiredToken|AccessDenied|Unauthorized|403'

# Who am I, really?
aws sts get-caller-identity

# Renew short-lived credentials
aws sso login --profile <PROFILE>

# Check profile/region/role vs. config
grep -rn 'profile\|region\|assume_role' provider.tf
env | grep -E 'AWS_PROFILE|AWS_REGION|AWS_DEFAULT_REGION'

# Test state backend access separately
aws s3api head-object --bucket <STATE_BUCKET> --key <STATE_KEY>

Conclusion

Error refreshing state with a 403/ExpiredToken is an authentication or authorization failure during the read phase — the state file is fine. The usual root causes:

Expired short-lived STS/SSO credentials.
A wrong or missing AWS profile.
A broken assume-role chain or trust policy.
A region mismatch between provider and resources.
The state backend itself returning 403.
Valid credentials lacking Describe*/Get* permissions.

Read whether it says expired (renew) or denied (fix role/region/permissions), verify your identity with get-caller-identity, and test the backend separately. Wrap long runs in credential auto-refresh so applies never die mid-flight.

Terraform Error Guide: 'Error refreshing state' authentication / 403 expired credentials