Automation Error Guide: '429 Too Many Requests' Downstream API Rate Limit in a Job
Fix 429 Too Many Requests from a downstream API in automation jobs: diagnose burst calls, missing backoff, ignored Retry-After, shared quota, and concurrency fan-out.
- #automation
- #troubleshooting
- #errors
- #rate-limiting
Overview
A 429 Too Many Requests means a downstream API rejected your call because you exceeded its rate limit. The server is healthy and your request is well-formed — it’s simply throttling you. In automation, this commonly happens when a job iterates over a large list and fires requests faster than the API’s per-second or per-minute quota allows, or when many parallel workers share one quota. Without backoff that honors the server’s Retry-After, the job either fails outright or makes the throttling worse by retrying immediately.
You will see this in the job log:
ERROR step=enrich-contacts http 429 Too Many Requests url=https://api.crm.example.com/v1/contacts/lookup
INFO response headers: Retry-After: 12 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1718900000
And the API’s body usually explains the limit:
{ "error": "rate_limited", "message": "API rate limit exceeded: 100 requests per minute" }
It occurs whenever the job’s call rate crosses the limit — a nightly batch over thousands of items, a fan-out across worker pods, or a retry storm after a transient blip. A job that ran under a small dataset can start 429ing the moment the dataset (or the worker count) grows.
Symptoms
- Job steps fail with
429/ “rate limit exceeded”; the API is otherwise reachable. X-RateLimit-Remaining: 0and aRetry-Afterheader on the response.- 429s cluster in bursts (start of a batch, or during a fan-out) rather than randomly.
- Naive retries make it worse — a retry storm keeps the remaining quota at 0.
# Count 429s and inspect rate-limit headers in the job log
grep -c "429" /var/log/jobs/enrich-contacts.log
grep -iE "Retry-After|X-RateLimit" /var/log/jobs/enrich-contacts.log | tail -3
317
Retry-After: 12
X-RateLimit-Remaining: 0
X-RateLimit-Limit: 100
# Reproduce and read the live limit headers
curl -s -D - -o /dev/null -H "Authorization: Bearer $TOKEN" \
https://api.crm.example.com/v1/contacts/lookup?q=test | grep -iE "429|ratelimit|retry-after"
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Remaining: 0
Common Root Causes
1. Unthrottled loop fires faster than the limit
The job iterates a list with no pacing, sending far more than the allowed requests/second.
grep -RniE "for .*await|map\(.*await|forEach.*fetch|Promise.all" ./jobs | head
jobs/enrich.ts:22: await Promise.all(items.map(i => api.lookup(i))) // unbounded fan-out
Promise.all over thousands of items launches them all at once — instant quota exhaustion.
2. No backoff / Retry-After ignored
On a 429 the job either fails or retries immediately, ignoring the Retry-After the server provided.
grep -RniE "Retry-After|retryAfter|backoff|sleep|setTimeout" ./jobs | head
(no matches)
No backoff logic means retries hit the still-exhausted quota and 429 again.
3. Concurrency / fan-out across many workers sharing one quota
The limit is per account/key, but N parallel workers each think they have headroom; combined they blow the shared quota.
grep -RniE "replicas|concurrency|WORKER_COUNT|parallelism" ./deploy ./jobs | head
deploy/job.yaml: parallelism: 20
20 parallel workers against a 100 req/min limit means each can only do ~5/min — coordination is required.
4. Per-key vs per-IP limit confusion
You scaled out IPs expecting more headroom, but the limit is per API key (or per account), so distributing across hosts doesn’t help.
# Same key everywhere?
grep -RniE "API_KEY|Authorization" ./jobs ./deploy | sort -u | head
deploy/job.yaml: API_KEY=sk_live_shared_one_key
One shared key means one shared quota regardless of how many IPs you spread across.
5. Retry storm amplifying the limit
A transient failure triggers retries from many in-flight requests at once, spiking the rate and self-inflicting 429s.
grep -RniE "maxRetries|retries:|retryCount" ./jobs | head
grep -c "429" /var/log/jobs/enrich-contacts.log
jobs/enrich.ts:31: retries: 5 // immediate, no jitter
317
Five immediate retries with no jitter multiplies the request rate exactly when quota is scarce.
6. Limit lower than assumed / changed by the provider
The job is paced to an old, higher limit; the provider lowered it (or you’re on a smaller tier) so the existing pacing now exceeds it.
curl -s -D - -o /dev/null -H "Authorization: Bearer $TOKEN" <URL> | grep -i "X-RateLimit-Limit"
X-RateLimit-Limit: 60
If the job paces to 100/min but the header now says 60, the pacing itself is the problem.
Diagnostic Workflow
Step 1: Confirm 429 and read the limit headers
curl -s -D - -o /dev/null -H "Authorization: Bearer $TOKEN" <URL> \
| grep -iE "429|X-RateLimit-Limit|X-RateLimit-Remaining|Retry-After"
This gives you the actual limit, remaining quota, and required wait — the numbers you must pace to.
Step 2: Measure your job’s real request rate
# Requests per minute the job is actually sending
grep "http " /var/log/jobs/enrich-contacts.log \
| awk '{print substr($1,1,16)}' | sort | uniq -c | tail -5
Compare requests/min to X-RateLimit-Limit; if it exceeds the limit, that’s your gap.
Step 3: Check whether the quota is shared across workers
grep -RniE "parallelism|replicas|concurrency|WORKER_COUNT" ./deploy ./jobs
grep -RniE "API_KEY|Authorization" ./jobs ./deploy | sort -u
Multiple workers + one key = a shared quota that needs a central limiter, not per-worker pacing.
Step 4: Verify backoff honors Retry-After
grep -RniE "Retry-After|retryAfter|backoff|jitter|sleep|setTimeout" ./jobs
Confirm 429s trigger a wait equal to Retry-After (with jitter), not an immediate retry.
Step 5: Add or tune a client-side limiter, then re-run a small batch
# Throttle the loop to under the limit (example: 1 req / 700ms ~= 85/min for a 100/min cap)
grep -RniE "limiter|p-limit|bottleneck|rate" ./jobs
Cap concurrency and rate below the published limit, then re-run a bounded batch and watch for zero 429s.
Example Root Cause Analysis
A nightly enrich-contacts job that used to finish clean now logs hundreds of 429s and aborts partway. The API is up and the token is valid.
Reading the live headers shows the limit and the wait:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
Retry-After: 12
The job fans out with Promise.all over the whole contact list:
grep -Rni "Promise.all" ./jobs/enrich.ts
jobs/enrich.ts:22: await Promise.all(items.map(i => api.lookup(i)))
The contact list grew from a few hundred to several thousand. Promise.all launches every lookup simultaneously, so the job sends thousands of requests in a couple of seconds against a 100/min limit — quota hits zero instantly and every subsequent call 429s. There’s no backoff, so it just fails.
Fix: bound concurrency and pace to under the limit, and honor Retry-After on the rare 429:
# Replace Promise.all with a concurrency-limited, rate-paced runner:
# const limit = pLimit(2); ~1 req/700ms; on 429 sleep(Retry-After*1000) then retry
node jobs/enrich.js --max-rate 85/min --concurrency 2
processed 4120 contacts, 0 rate-limit errors
Paced under the 100/min cap, the job completes with no 429s.
Prevention Best Practices
- Pace client calls below the published limit with a token-bucket / concurrency limiter; never
Promise.allan unbounded list against a rate-limited API. - Always honor
Retry-After(and back off with jitter) on a 429 instead of retrying immediately — immediate retries are a self-inflicted retry storm. - For multi-worker jobs sharing one API key, enforce the rate centrally (a shared limiter / distributed token bucket), since per-worker pacing doesn’t bound the shared quota.
- Read
X-RateLimit-Limit/Remainingfrom responses and adapt dynamically rather than hard-coding an assumed limit the provider may have lowered. - Make the job resumable/checkpointed so a throttled batch can continue rather than restarting and re-spiking the rate.
- Alert on 429 rate so a limit change or dataset growth is caught early. The free incident assistant can correlate a 429 spike with a deploy or data-size change; see more automation guides.
Quick Command Reference
# Read the live rate-limit headers
curl -s -D - -o /dev/null -H "Authorization: Bearer $TOKEN" <URL> \
| grep -iE "429|X-RateLimit-Limit|X-RateLimit-Remaining|Retry-After"
# Count 429s and inspect headers in the log
grep -c "429" /var/log/jobs/<job>.log
grep -iE "Retry-After|X-RateLimit" /var/log/jobs/<job>.log | tail
# Measure the job's actual requests/min
grep "http " /var/log/jobs/<job>.log | awk '{print substr($1,1,16)}' | uniq -c | tail
# Find unbounded fan-out and missing backoff
grep -RniE "Promise.all|forEach.*fetch|Retry-After|backoff|p-limit|bottleneck" ./jobs
# Shared-quota check (workers vs one key)
grep -RniE "parallelism|replicas|concurrency|API_KEY" ./deploy ./jobs | sort -u
Conclusion
A 429 Too Many Requests in a job means you crossed a healthy downstream’s rate limit. The usual root causes:
- An unthrottled loop firing faster than the per-second/minute quota.
- No backoff and the
Retry-Afterheader ignored. - Many parallel workers sharing one per-key quota.
- Confusing a per-key/account limit with a per-IP one.
- A retry storm amplifying the request rate at the worst moment.
- The provider’s limit being lower than the job’s hard-coded assumption.
Read the live X-RateLimit-Limit and your job’s actual request rate first — the gap between them tells you exactly how much to throttle, and honoring Retry-After keeps retries from making it worse.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.