Cloud API Quota and Throttling Incident Triage Prompt

Triage a live incident caused by hitting a cloud-provider API rate limit or service quota, and decide whether to back off, request a quota increase, or shed the work driving the throttling.

Target user

On-call engineers seeing 429s, throttling, or quota-exceeded errors from a cloud provider

Difficulty

Intermediate

Tools

Claude, ChatGPT

You are a seasoned SRE who has been paged because a cloud provider started returning 429s or "quota exceeded," and you know the worst response is to retry harder — which only deepens the throttling. I will paste the situation: the throttling errors and which provider API they come from, what changed (deploy, traffic spike, a new caller, a runaway loop), the current call rate if known, and whether this is a per-second rate limit or a hard service quota. Your job: 1. **Distinguish rate-limit vs. quota** — from the error and pattern, determine whether we're hitting a soft per-second/per-minute rate limit (recoverable by slowing down) or a hard account/service quota (needs a limit increase or fewer resources). The fix differs. 2. **Find the driver** — identify what is generating the calls: a deploy that changed call patterns, a retry storm amplifying a downstream failure, a batch job, or a genuine traffic increase. Rank the candidates. 3. **Check for retry amplification** — explicitly assess whether our own retries are making the throttling worse, and recommend backoff/jitter or a circuit breaker if so. 4. **Mitigation menu** — propose ordered options: throttle our own callers, add exponential backoff, cache/batch the calls, pause the offending job, fail over to another region/account, or request a quota increase. For each: how fast it helps and the tradeoff. 5. **Quota-increase reality check** — if a provider quota increase is needed, note that it may not be instant and recommend a stopgap that doesn't depend on it. 6. **Recovery signal** — the metric (429 rate clearing, call rate under limit) that confirms recovery, and a note to re-check that we didn't just shift the throttling to another quota. Output as: (a) rate-limit vs. quota verdict, (b) the most-likely driver, (c) the ordered mitigation menu with tradeoffs, (d) the recovery signal. Propose; the human applies changes and files any quota request. Don't recommend raising our retry rate or removing backoff. If you can't tell rate-limit from quota from the evidence, say so and give the safe check.

Why this prompt works

Cloud-provider throttling incidents have a counterintuitive trap built in: the instinctive response to errors — retry — is exactly what makes throttling worse, since each retry counts against the same limit you’ve already blown past. This prompt leads with backoff discipline and explicitly forbids recommending faster retries, steering on-call away from the retry storm that turns a brief throttle into a sustained outage.

The rate-limit-versus-quota distinction is the diagnostic fork that determines whether you can fix this yourself in minutes or need to wait on the provider. A soft per-second limit clears the moment you slow down; a hard account quota needs a limit increase that may not be instant. By forcing that classification and demanding a stopgap that doesn’t depend on the provider granting an increase, the prompt keeps you from staking recovery on a support ticket’s timeline.

The guardrails reflect that you don’t control the provider’s side of this. The AI proposes self-throttling, backoff, caching, and failover options that you can act on immediately, and flags when your own retries are amplifying the problem — but the human applies the changes and files any quota request. It’s a clean split between the analysis the model is good at and the actions and external requests that stay with the responder.

Cloud API Quota and Throttling Incident Triage Prompt

Why this prompt works

Related prompts

Cache Stampede and Thundering-Herd Mitigation Prompt

Third-Party and Vendor Coordination During a Major Incident Prompt

Why this prompt works

Related prompts

Cache Stampede and Thundering-Herd Mitigation Prompt

Third-Party and Vendor Coordination During a Major Incident Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet