Cloud API Quota and Throttling Incident Triage Prompt
Triage a live incident caused by hitting a cloud-provider API rate limit or service quota, and decide whether to back off, request a quota increase, or shed the work driving the throttling.
- Target user
- On-call engineers seeing 429s, throttling, or quota-exceeded errors from a cloud provider
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a seasoned SRE who has been paged because a cloud provider started returning 429s or "quota exceeded," and you know the worst response is to retry harder — which only deepens the throttling. I will paste the situation: the throttling errors and which provider API they come from, what changed (deploy, traffic spike, a new caller, a runaway loop), the current call rate if known, and whether this is a per-second rate limit or a hard service quota. Your job: 1. **Distinguish rate-limit vs. quota** — from the error and pattern, determine whether we're hitting a soft per-second/per-minute rate limit (recoverable by slowing down) or a hard account/service quota (needs a limit increase or fewer resources). The fix differs. 2. **Find the driver** — identify what is generating the calls: a deploy that changed call patterns, a retry storm amplifying a downstream failure, a batch job, or a genuine traffic increase. Rank the candidates. 3. **Check for retry amplification** — explicitly assess whether our own retries are making the throttling worse, and recommend backoff/jitter or a circuit breaker if so. 4. **Mitigation menu** — propose ordered options: throttle our own callers, add exponential backoff, cache/batch the calls, pause the offending job, fail over to another region/account, or request a quota increase. For each: how fast it helps and the tradeoff. 5. **Quota-increase reality check** — if a provider quota increase is needed, note that it may not be instant and recommend a stopgap that doesn't depend on it. 6. **Recovery signal** — the metric (429 rate clearing, call rate under limit) that confirms recovery, and a note to re-check that we didn't just shift the throttling to another quota. Output as: (a) rate-limit vs. quota verdict, (b) the most-likely driver, (c) the ordered mitigation menu with tradeoffs, (d) the recovery signal. Propose; the human applies changes and files any quota request. Don't recommend raising our retry rate or removing backoff. If you can't tell rate-limit from quota from the evidence, say so and give the safe check.
Why this prompt works
Cloud-provider throttling incidents have a counterintuitive trap built in: the instinctive response to errors — retry — is exactly what makes throttling worse, since each retry counts against the same limit you’ve already blown past. This prompt leads with backoff discipline and explicitly forbids recommending faster retries, steering on-call away from the retry storm that turns a brief throttle into a sustained outage.
The rate-limit-versus-quota distinction is the diagnostic fork that determines whether you can fix this yourself in minutes or need to wait on the provider. A soft per-second limit clears the moment you slow down; a hard account quota needs a limit increase that may not be instant. By forcing that classification and demanding a stopgap that doesn’t depend on the provider granting an increase, the prompt keeps you from staking recovery on a support ticket’s timeline.
The guardrails reflect that you don’t control the provider’s side of this. The AI proposes self-throttling, backoff, caching, and failover options that you can act on immediately, and flags when your own retries are amplifying the problem — but the human applies the changes and files any quota request. It’s a clean split between the analysis the model is good at and the actions and external requests that stay with the responder.
Related prompts
-
Cache Stampede and Thundering-Herd Mitigation Prompt
Diagnose a live incident where a cache miss, flush, or restart is hammering the origin with a thundering herd, and pick the fastest safe mitigation to protect the backend without dropping all traffic.
-
Third-Party and Vendor Coordination During a Major Incident Prompt
Run the playbook for incidents caused by or dependent on an external vendor (cloud provider, CDN, payment processor, SaaS dependency) — escalation, status correlation, customer comms, and parallel mitigation while you wait on someone else's fix.