Azure Error Guide: '429 TooManyRequests' ARM Throttling

Overview

An Azure 429 TooManyRequests happens when Azure Resource Manager (ARM) or a resource provider rejects your request because you have exceeded a request-rate limit. ARM tracks a token bucket of remaining reads and writes per subscription (and per resource provider), and once the bucket is empty it returns 429 with a Retry-After header telling you how long to wait. The operation does not complete until you slow down and retry.

You will see this in a CLI response or pipeline log:

(TooManyRequests) The request is being throttled as the limit has been reached for operation 'GetVirtualMachine'. Please try again after '23' seconds.
Code: TooManyRequests
Message: The request is being throttled as the limit has been reached for operation 'GetVirtualMachine'.

And in the raw HTTP exchange (--debug / --verbose) the response headers carry the budget:

Response status: 429
Retry-After: 23
x-ms-ratelimit-remaining-subscription-reads: 0
x-ms-ratelimit-remaining-subscription-resource-requests: 0
x-ms-request-id: 8f2c1d44-2a17-49b8-9c0e-1d3f5a6b7c80

It occurs whenever requests arrive faster than the bucket refills — most often during large automation runs, tight polling loops, parallel Terraform apply, or fan-out scripts that iterate over many resources. The limit is per-subscription per-region per-operation-type, so one noisy pipeline can throttle everything else in the same subscription.

Symptoms

CLI or SDK calls fail intermittently with (TooManyRequests) and a Retry-After value.
terraform apply errors with Status=429 Code="TooManyRequests" partway through a plan.
Deployments stall and resume in bursts as the bucket refills.
The remaining-reads/writes header trends toward 0 under load.

az vm show --resource-group rg-prod --name web-01 --debug 2>&1 \
  | grep -iE 'x-ms-ratelimit-remaining-subscription|Retry-After|status: 429'

Response status: 429
Retry-After: 23
x-ms-ratelimit-remaining-subscription-reads: 0

az group deployment list --resource-group rg-prod -o table 2>&1 | tail -3

(TooManyRequests) The request is being throttled as the limit has been reached for operation 'ListDeployments'. Please try again after '17' seconds.

Common Root Causes

1. Subscription-level ARM read/write limits exhausted

ARM enforces a per-subscription budget for read and write requests. A burst of list/show calls drains the read bucket; bulk creates/updates drain the write bucket.

az vm list --query "length(@)" -o tsv
az group list --debug 2>&1 \
  | grep -i 'x-ms-ratelimit-remaining-subscription-reads'

x-ms-ratelimit-remaining-subscription-reads: 4

A remaining count this close to 0 means the next handful of reads will return 429.

2. Per-resource-provider throttling

Each resource provider (Compute, Network, Storage) keeps its own limit independent of the ARM subscription bucket. Hammering one provider throttles only its operations.

az vm list -g rg-prod --debug 2>&1 \
  | grep -i 'x-ms-ratelimit-remaining-resource'

x-ms-ratelimit-remaining-resource: Microsoft.Compute/HighCostGet3Min;0,Microsoft.Compute/HighCostGet30Min;312

The HighCostGet3Min counter at 0 shows the Compute provider’s short-window budget for expensive GETs is spent.

3. Tight polling loops or high Terraform parallelism

A loop that polls a resource state with no delay, or terraform running its default 10 parallel operations against many resources, generates requests faster than the bucket refills.

terraform apply -parallelism=10 2>&1 | grep -iE '429|TooManyRequests' | head -3

Error: waiting for creation of Network Interface: Code="TooManyRequests" Message="The request is being throttled..."
Error: retrieving Virtual Machine: Status=429 Code="TooManyRequests"

Lowering -parallelism reduces the concurrent request rate against ARM.

4. Large fan-out automation across many resources

A script that iterates over hundreds of resources, issuing a show/update per item with no batching or pacing, exhausts the bucket within seconds.

for rg in $(az group list --query "[].name" -o tsv); do
  az resource list -g "$rg" -o none
done 2>&1 | grep -ic 'TooManyRequests'

37 throttled responses across the loop signals fan-out is outrunning the limit; batch with --query server-side or add pacing.

5. Tenant-level throttling from shared identity

Multiple subscriptions or pipelines authenticating as the same service principal share tenant-scoped limits (for example Graph or management-group reads), so unrelated jobs throttle each other.

az account show --query "{tenant:tenantId, sub:id}" -o json
az role assignment list --all --debug 2>&1 \
  | grep -i 'x-ms-ratelimit-remaining-tenant-reads'

x-ms-ratelimit-remaining-tenant-reads: 2

A near-zero tenant-reads counter points at contention from other jobs using the same identity.

6. Missing exponential backoff / Retry-After ignored

Client code that retries immediately (or with a fixed tiny delay) instead of honoring the Retry-After header turns a transient 429 into a sustained throttle storm.

az vm get-instance-view -g rg-prod -n web-01 --debug 2>&1 \
  | grep -iE 'Retry-After|status: 429'

Response status: 429
Retry-After: 30

If your retry fires before the Retry-After: 30 window elapses, every retry is itself throttled.

Diagnostic Workflow

Step 1: Confirm it is a 429 and read the Retry-After value

az <command> --debug 2>&1 \
  | grep -iE 'status: 429|Retry-After|TooManyRequests'

Step 2: Identify which bucket is empty (subscription vs resource provider)

az <command> --debug 2>&1 \
  | grep -iE 'x-ms-ratelimit-remaining-(subscription|resource|tenant)'

The header at or near 0 tells you whether it is the ARM subscription bucket, a resource provider, or the tenant.

Step 3: Find the operation and the caller generating the load

az monitor activity-log list \
  --offset 1h \
  --query "[?httpRequest != null].{op:operationName.value, caller:caller, status:status.value}" \
  -o table | grep -i throttl

Step 4: Reduce concurrency in the offending client

# Terraform: drop parallelism
terraform apply -parallelism=3
# Ad-hoc loops: pace requests and batch with server-side --query
az resource list --query "[?type=='Microsoft.Compute/virtualMachines'].id" -o tsv

Step 5: Verify the bucket recovers after backing off

sleep 30
az group list --debug 2>&1 \
  | grep -i 'x-ms-ratelimit-remaining-subscription-reads'

A rising remaining-reads count confirms the bucket is refilling and the throttle has cleared.

Example Root Cause Analysis

A nightly Terraform pipeline that manages roughly 200 VMs starts failing midway with Status=429 Code="TooManyRequests", and other engineers report az vm list intermittently throttling in the same subscription.

The deployment log shows the operation:

Error: retrieving Virtual Machine "web-114": Status=429 Code="TooManyRequests" Message="The request is being throttled as the limit has been reached for operation 'GetVirtualMachine'. Please try again after '28' seconds."

A debug read confirms the Compute provider’s short-window budget, not the subscription bucket, is the bottleneck:

az vm show -g rg-prod -n web-01 --debug 2>&1 \
  | grep -i 'x-ms-ratelimit-remaining-resource'

x-ms-ratelimit-remaining-resource: Microsoft.Compute/HighCostGet3Min;0,Microsoft.Compute/HighCostGet30Min;88

The pipeline runs terraform apply -parallelism=10, and the provider issues per-VM GetVirtualMachine refreshes during plan. With 200 VMs and 10 concurrent refreshes, the Compute HighCostGet3Min bucket drains to 0, and because the retry logic ignored Retry-After, immediate retries kept the bucket empty.

Fix: lower parallelism so the request rate stays under the refill rate, and let retries honor the header:

terraform apply -parallelism=3
# confirm recovery
sleep 30
az vm show -g rg-prod -n web-01 --debug 2>&1 \
  | grep -i 'x-ms-ratelimit-remaining-resource'

x-ms-ratelimit-remaining-resource: Microsoft.Compute/HighCostGet3Min;297,Microsoft.Compute/HighCostGet30Min;1140

The bucket recovers, the apply completes, and the shared throttling on other engineers’ az vm list calls disappears.

Prevention Best Practices

Always honor the Retry-After header: retry only after the value elapses, and wrap clients in exponential backoff with jitter rather than fixed-delay loops.
Cap concurrency in automation: use terraform -parallelism and bounded worker pools so the request rate stays below the bucket’s refill rate.
Batch server-side with --query and az graph query (Resource Graph) instead of looping az resource show per item; one Resource Graph call replaces hundreds of ARM reads.
Isolate noisy pipelines into their own subscription where the workload allows, so one job cannot exhaust shared per-subscription buckets.
Monitor the x-ms-ratelimit-remaining-* headers and alert before they hit 0, not after requests start failing. See more in the Azure guides.
For ad-hoc triage, the free incident assistant can summarize a throttled pipeline log into which bucket (subscription, provider, or tenant) is exhausted and the right backoff.

Quick Command Reference

# Confirm a 429 and read the backoff value
az <command> --debug 2>&1 | grep -iE 'status: 429|Retry-After|TooManyRequests'

# See which bucket is empty
az <command> --debug 2>&1 | grep -iE 'x-ms-ratelimit-remaining-(subscription|resource|tenant)'

# Subscription read budget
az group list --debug 2>&1 | grep -i 'x-ms-ratelimit-remaining-subscription-reads'

# Resource-provider (Compute) budget
az vm list -g <RG> --debug 2>&1 | grep -i 'x-ms-ratelimit-remaining-resource'

# Find throttled operations and callers
az monitor activity-log list --offset 1h \
  --query "[?status.value=='Failed'].{op:operationName.value, caller:caller}" -o table

# Replace per-item loops with one Resource Graph query
az graph query -q "Resources | where type =~ 'microsoft.compute/virtualmachines' | project name, id"

# Reduce client concurrency
terraform apply -parallelism=3

# Verify the bucket recovers
sleep 30; az group list --debug 2>&1 | grep -i 'x-ms-ratelimit-remaining-subscription-reads'

Conclusion

A 429 TooManyRequests means an ARM or resource-provider rate bucket is empty and Azure is telling you to slow down via Retry-After. The usual root causes:

The per-subscription ARM read or write bucket is exhausted by a burst of requests.
A single resource provider (Compute/Network/Storage) hit its own independent limit.
A tight polling loop or high Terraform parallelism outruns the bucket’s refill rate.
Large fan-out automation iterates over many resources without batching or pacing.
A shared service principal causes tenant-level throttling across unrelated jobs.
Clients ignore Retry-After and retry immediately, sustaining the throttle.

Read the x-ms-ratelimit-remaining-* headers to find the empty bucket, then cut concurrency and honor Retry-After — the fix is almost always pacing the caller, not raising a limit.