Connecting Postmortems to SLO and Error-Budget Impact With AI
An incident in isolation is just a bad day. Here's how to use AI to translate a postmortem into SLO and error-budget terms that change your release posture.
- #postmortems
- #postmortem
- #ai
Most postmortems describe an incident as a story: it broke, it was bad for a while, we fixed it, here’s why. That framing is fine for understanding what happened, but it leaves the incident floating in isolation, disconnected from the system you actually use to manage reliability. The same incident expressed in SLO and error-budget terms becomes something else entirely — a data point that tells you whether you’re spending reliability faster than you’re earning it, and whether your next release should ship at all.
The error-budget policy your team wrote during a calm planning offsite — “freeze releases when we’ve burned 80% of the quarter’s availability budget” — only means something if incidents get translated into budget terms. Most don’t. So the policy sits in a wiki, the incidents pile up as stories, and the connection that should govern release posture never gets made. Making that translation is a small, mechanical, easy-to-get-subtly-wrong calculation, which is a good fit for AI with one firm rule: it computes only from numbers you give it.
From “it was bad” to a budget number
The translation has a few steps and the arithmetic isn’t hard, but it’s the kind of thing humans skip at the end of a review and occasionally fumble. Map the failure to the SLIs it affected, count the bad events over the SLO window, derive how much budget this one incident consumed, and state what’s left.
Connect this incident to its SLO and error-budget impact.
1. Identify which SLI(s) and SLO(s) this affected and how the
failure maps to each (availability, latency, error rate).
2. Compute error-budget impact FROM MY NUMBERS: bad events over
the window, budget consumed by this incident, resulting
remaining budget. Show the arithmetic. If a number is missing,
list it as [NEEDED INPUT] — do not guess.
3. State budget posture: over / near / within budget, and does the
stated policy trigger anything (freeze, review, notification)?
4. Write a 2-3 sentence plain-language "SLO impact" section for a
mixed audience.
5. Note any SLO this incident suggests is mis-specified.
Rules: Never invent traffic, budget, or target numbers. Compute
only from my inputs; mark gaps [NEEDED INPUT].
Impact: <affected SLI, duration, affected traffic/volume>
SLOs: <targets, window, current budget remaining>
Policy: <thresholds and what they trigger>
Show the work, because the conclusion moves money
The arithmetic has to be visible, because the output drives a real decision and a hidden calculation is an untrustworthy one. After the human confirms the inputs, it reads like this:
SLO impact
Affected SLO: Checkout availability, target 99.9% over a 30-day window.
- 30-day budget: 0.1% of requests may fail = ~43.2 min of full outage equivalent.
- This incident: checkout returned errors for 12 minutes affecting ~60% of checkout traffic ≈ 7.2 minutes of budget-equivalent burn.
- Budget before incident: 62% remaining. After: ~62% − (7.2 / 43.2) ≈ 62% − 17% = 45% remaining.
Posture: Within budget but this single incident burned 17% of the quarter’s allowance. Policy freeze threshold is at 80% burned (20% remaining); we are not there, but two more incidents of this size would cross it.
Plain-language section: This outage consumed roughly a sixth of our quarterly checkout-reliability budget in twelve minutes. We’re still within policy with 45% of the budget left, but we have less room for the rest of the quarter than the incident’s short duration suggests.
That last paragraph is the one that makes a non-SRE reader understand why a twelve-minute blip matters. “It was only twelve minutes” and “it burned a sixth of the quarter’s budget” are the same fact, and only the second one changes behavior.
The inputs are the whole risk
This is the one place where a fabricated number does real damage, so the rule against guessing is absolute. The budget math is only as trustworthy as the traffic figure, the window, and the current remaining budget — and if any of those is missing and the model invents it, the conclusion can wrongly trigger a release freeze (stalling the team over phantom burn) or wrongly avoid one (shipping into a budget that’s actually gone). Both are expensive. Marking missing values as [NEEDED INPUT] instead of estimating them keeps the math honest, and a postmortem that says “I need the affected traffic volume to compute this” is infinitely better than one that confidently reports a budget figure built on a guess.
Mis-specified SLOs, flagged not changed
A useful side effect: walking the incident through your SLOs sometimes reveals that an SLO is wrong. It might be too loose — the incident hurt real customers but barely dented the budget, which means the SLO isn’t protecting what matters. Or too tight — minor blips keep eating budget for impact users never noticed. The prompt can flag these candidates, and they’re worth surfacing. But it explicitly cannot quietly propose a looser target, because loosening an SLO is a policy decision, and a looser SLO is exactly how you make future incidents invisible. Flag the candidate; let a human decide whether the target or the system should change.
The human owns the policy trigger
The model computes the burn and states whether policy would trigger. It does not invoke the freeze. Whether to actually halt releases is a call that weighs things outside the budget math — what’s queued to ship, business commitments, the team’s read on stability — and that belongs to the people with the authority to make it. The value of the AI pass is that the decision is now grounded in a correct, visible budget number instead of a hand-wave about how bad the incident felt.
The SLO-impact prompt is in the prompts library, and it pairs with the customer and business impact work for the fuller picture of what an incident cost — budget on the engineering side, customers and revenue on the business side. Together they turn an incident from a story into two kinds of accounting.
Translate the incident into budget. It’s the difference between a postmortem that explains a bad day and one that tells you whether you can afford to ship tomorrow.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.