AWS Error Guide: 'Rate exceeded' CloudFormation API

Exact Error Message

An error occurred (Throttling) when calling the DescribeStacks operation (reached max retries: 4): Rate exceeded

During a deployment it can appear inside stack events as a resource-level failure:

Resource handler returned message: "Rate exceeded (Service: AmazonEC2; Status Code: 503; Error Code: RequestLimitExceeded)"

The error code is Throttling (or ThrottlingException) with the message Rate exceeded.

What the Error Means

Every AWS API, including CloudFormation and the services it provisions, enforces request-rate limits. Rate exceeded means you sent CloudFormation (or a downstream service it calls on your behalf) more API requests per second than the account/region currently allows, so AWS returned a throttling response instead of processing the call. This is a rate limit, not a resource quota — the request itself is valid, it is simply arriving too fast. Retrying the same call a moment later usually succeeds, the key distinction from a quota error that fails no matter how long you wait.

These limits are enforced per account and per region, and most use a token-bucket model: a steady refill rate plus a burst allowance. Short spikes are absorbed, but sustained traffic above the refill rate drains the bucket and every subsequent call throttles until it refills. That is why a deployment can run fine and then suddenly start failing once a burst exhausts the bucket.

It commonly appears when many stacks deploy in parallel, when CI pipelines poll stack status aggressively, or when a large template provisions many resources that each call a downstream service. Because the budget is shared across the whole account, one noisy pipeline can throttle unrelated deployments running at the same time.

Common Causes

Aggressive status polling. A pipeline calls DescribeStacks/DescribeStackEvents in a tight loop while waiting. Describe APIs share the same budget as the operations you care about, so a busy-wait loop can starve the create-stack calls it is monitoring.
Many concurrent stack operations. Dozens of create-stack/update-stack or StackSet operations launch at once. Each generates its own stream of describe and provisioning calls, so concurrency multiplies request volume rather than just adding to it.
Large templates fan out downstream calls. A template creating hundreds of resources drives a burst of EC2/IAM/S3 API calls. CloudFormation provisions independent resources in parallel, so a single large stack can momentarily flood a downstream API.
Shared account-level rate budget. Multiple pipelines and engineers share the same per-region API limits. The throttle you hit may be caused by someone else’s deployment entirely, making these failures feel intermittent.
No retry with backoff. Clients retry immediately on throttle, amplifying the load. A tight retry loop turns one throttled call into a storm that keeps the bucket empty.
StackSets across many accounts/regions. High concurrency settings hammer the API. StackSets fan operations out to every target at once, so an aggressive concurrency percentage can throttle the management account and targets together.

How to Reproduce the Error

Poll stack status in a tight loop without delay while a deployment runs:

# Repeated, rapid describe calls can trip the rate limit (read-only)
for i in $(seq 1 200); do \
  aws cloudformation describe-stacks --stack-name demo-stack \
    --query 'Stacks[0].StackStatus' --output text; done

Under load this returns:

An error occurred (Throttling) when calling the DescribeStacks operation (reached max retries: 4): Rate exceeded

Launching many stacks simultaneously produces the same throttle from create-stack calls.

Diagnostic Commands

Confirm the caller and region:

aws sts get-caller-identity

Check the current state without aggressive polling (a single call, not a loop):

aws cloudformation describe-stacks --stack-name demo-stack \
  --query 'Stacks[0].StackStatus' --output text

Inspect stack events for downstream throttling messages:

aws cloudformation describe-stack-events --stack-name demo-stack \
  --query 'StackEvents[?contains(ResourceStatusReason, `Rate exceeded`)].[LogicalResourceId,ResourceStatusReason]' \
  --output table

List in-flight operations that may be competing for the rate budget:

aws cloudformation list-stacks \
  --stack-status-filter CREATE_IN_PROGRESS UPDATE_IN_PROGRESS \
  --query 'StackSummaries[].StackName' --output text

Check the configured retry mode the CLI is using:

aws configure get retry_mode

Step-by-Step Resolution

Stop tight polling. Replace busy loops with the CLI wait commands or back your polling off to several seconds between calls. The built-in waiters use a sensible delay, so swapping a hand-rolled loop for aws cloudformation wait is usually the single biggest reduction in request volume you can make.
Enable adaptive retries. Set the CLI to adaptive mode so it backs off automatically:
```
aws configure set retry_mode adaptive
aws configure set max_attempts 10
```
In SDKs, enable the standard/adaptive retry strategy with exponential backoff and jitter. Adaptive mode goes further than fixed retries by tracking throttle responses and slowing the client before it hits the limit again.
Reduce concurrency. Serialize or batch stack operations and lower StackSet MaxConcurrentPercentage/region concurrency so fewer calls land per second. For a fleet of stacks, a small fixed concurrency (say three to five at a time) usually finishes faster overall than launching everything at once and fighting throttles.
Stagger pipelines. Avoid launching many deployments at the same minute; spread them out or add a global concurrency gate. A shared deployment lock or queue prevents independent pipelines from summing their request rates into a throttle none of them would hit alone.
Split giant templates. Break a monolithic template into smaller nested stacks so downstream API bursts are spread over time. Smaller stacks also roll back faster, shrinking the blast radius.
Verify by re-running the deployment with backoff enabled; transient Rate exceeded responses should be absorbed by retries and no longer fail the operation. Confirm the retries are succeeding rather than just deferring the failure — if calls still fail after backoff, concurrency is the next lever.

Prevention and Best Practices

Always enable exponential backoff with jitter on AWS clients; throttling is expected and retryable. Jitter prevents many clients from retrying in lockstep and re-creating the burst they were just throttled for.
Use built-in wait commands instead of hand-rolled polling loops that hammer DescribeStacks. The waiters are tuned to AWS’s rate expectations, so you inherit sensible polling for free.
Cap concurrency for parallel stack and StackSet operations to stay within account rate budgets. Treat concurrency as a deliberate setting rather than letting your CI runner launch as many jobs as it has capacity for.
Stagger scheduled deployments so pipelines do not all fire simultaneously. Cron jobs that trigger on the hour are a classic source of synchronized bursts; offsetting their schedules removes the spike.
Decompose very large templates into nested stacks to flatten downstream API bursts, which also makes failures faster to recover from.
Monitor for Throttling/Rate exceeded in CloudTrail and alert on spikes so you can tune concurrency before deploys fail.

RequestLimitExceeded — the EC2-specific throttling code that appears in stack events.
ThrottlingException — the same condition from other services’ SDKs.
TooManyRequestsException — Lambda/API Gateway’s throttling variant.
LimitExceeded — a resource quota (count) rather than a request-rate limit, a different fix.

Frequently Asked Questions

Is Rate exceeded the same as a quota limit? No. Rate exceeded is a requests-per-second limit and is retryable with backoff — the same call usually succeeds moments later. A quota limit caps the count of resources and requires a Service Quotas increase; retrying it will fail every time until the quota is raised. Knowing which one you hit tells you whether to add backoff or open a quota request.

Should I request a limit increase? For pure API rate limits, no. They are largely managed by AWS and not user-configurable, so the fix is backoff and reduced concurrency, not a quota request. AWS Support can sometimes adjust account-level API limits for a genuine high-throughput need, but that is a last resort after you have tuned the client.

Why does it happen only under parallel deploys? Concurrent operations and aggressive polling sum to more requests per second than the shared account budget allows. A single deployment stays under the limit, but five running together easily drain the token bucket and start throttling each other.

Does the CLI retry automatically? Yes, but the default may be too few attempts to ride out a sustained throttle. Set retry_mode adaptive and raise max_attempts so the CLI keeps backing off long enough for the rate limit to recover.

How do I tell which downstream service is throttling? Read the stack events with describe-stack-events and look at the ResourceStatusReason; AWS embeds the offending service and error code (for example Service: AmazonEC2; Error Code: RequestLimitExceeded) right in the message.

How do I stop my pipeline from causing this? Use wait commands, add backoff, and cap concurrency. See the AWS guides for deployment-throttling patterns.

AWS Error Guide: 'Rate exceeded' CloudFormation API Throttling During Deployments

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit