Postmortem SLO and Error-Budget Impact Prompt
Translate an incident into its SLO and error-budget terms — how much budget it burned, which SLO it threatens, and what that means for release posture — grounded only in the numbers you provide.
- Target user
- SRE tying an incident's postmortem to reliability targets and budget policy
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a staff SRE who connects every incident back to the reliability targets it affects. You quantify error-budget impact precisely from the numbers given and refuse to fabricate any value you weren't handed. I will paste: [INCIDENT IMPACT: affected SLI(s), the failure/degradation, duration, and affected traffic or request volume] [SLO DEFINITIONS: the relevant SLOs, their targets, window, and current budget remaining if known] [POLICY: what happens at budget thresholds — e.g. freeze releases at X% burned, if such a policy exists] Do the following: 1. Identify which SLI(s) and SLO(s) this incident affected and how the failure maps to each (availability, latency, error rate, etc.). 2. Compute the error-budget impact from the numbers I gave you: bad events over the window, budget consumed by this incident, and resulting remaining budget. Show the arithmetic. If a needed number is missing, list it as [NEEDED INPUT] and do not guess. 3. State the budget posture: are we now over, near, or comfortably within budget, and does the stated policy trigger anything (freeze, review, exec notification)? 4. Write a 2-3 sentence "SLO impact" section for the postmortem in plain language for a mixed audience. 5. Note any SLO that this incident suggests is mis-specified (too loose to have caught the pain, or too tight to be meaningful). Output format: an SLO-mapping table, the budget arithmetic shown step by step, a posture/policy statement, and the ready-to-paste SLO-impact paragraph. Guardrails: never invent traffic numbers, budget figures, or SLO targets — compute only from my inputs and mark gaps as [NEEDED INPUT]. Stay blameless and factual. The human owns whether any policy action (like a freeze) is actually invoked.
Why this prompt works
An incident in isolation is just a story about one bad day. An incident expressed in SLO and error-budget terms is a data point in a system you actually manage — it tells you whether you’re spending reliability faster than you’re earning it, and whether your release posture should change. Most postmortems never make this translation, so the incident’s real cost (a chunk of error budget that constrains the next quarter of shipping) stays invisible, and the budget policy that exists on paper never connects to the incidents that should trigger it.
This prompt does the budget arithmetic explicitly and shows its work, which matters because the math is easy to get subtly wrong and the conclusion drives real decisions. Mapping the failure to specific SLIs, computing bad-events-over-window, and deriving remaining budget turns “it was bad for a while” into “this burned 40% of the quarter’s availability budget, leaving 12%, which crosses the freeze threshold.” That sentence changes behavior in a way the raw incident never could. The plain-language SLO-impact paragraph then makes it legible to the mixed audience who reads the postmortem but doesn’t live in the budget dashboards.
The guardrails protect the one thing that makes this dangerous: the math is only as trustworthy as its inputs. A fabricated traffic number or invented budget figure could wrongly trigger a release freeze or, worse, wrongly avoid one — so the prompt is forbidden from guessing and must mark missing values as needed input. It can flag an SLO that looks mis-specified (too loose to have caught real pain, too tight to mean anything), but it can’t quietly propose a looser target, because that’s a policy decision that could hide the next incident. The human owns whether the policy actually fires; the model just makes the budget consequences visible and correct.