Writing the What Went Well Section of a Postmortem With AI
Postmortems that are only failure lists teach teams to hide. Here's how to write an honest what-went-well section, with AI surfacing the saves from the timeline.
- #postmortems
- #postmortem
- #ai
- #blameless
- #culture
I watched a junior engineer go quiet in an incident review once. She’d been the one who, four minutes into a database outage, thought to check the read-replica lag and caught that the failover hadn’t promoted cleanly—a save that shaved twenty minutes off the recovery. None of that made it into the postmortem. The document was a clean list of everything that broke and everyone’s gaps. She told me afterward she figured there was no point raising her hand again; the review only seemed to want failures. That’s the cost of a postmortem with no “what went well” section: it quietly teaches your best responders to stop volunteering.
A postmortem that is purely a failure inventory is not blameless, no matter how carefully you scrub the names. It still trains people to associate incident review with exposure. The “what went well” section isn’t a participation trophy—it’s how you find the controls worth protecting and the instincts worth spreading.
Why this section is more than morale
There’s a hard engineering reason to write it. Most reviews obsess over what to add—new alerts, new runbooks, new guardrails. But every incident also exercised your existing defenses, and some of them worked. If you never record which ones, you can’t tell the difference between “we got lucky” and “the circuit breaker did exactly its job.” Next quarter when someone proposes ripping out that circuit breaker to simplify the code, there’s no record that it saved you.
So “what went well” has three real jobs: surface controls that worked (keep them), surface human saves (teach them), and balance the document so people stay honest in the next one. Skip it and you optimize purely for blame-avoidance, which is the opposite of what you wanted.
Where AI actually helps here
The wins in an incident are hiding in the timeline, scattered across the incident channel, and easy to miss because nobody narrates their own good calls. This is a needle-in-haystack reading task—exactly where I let AI go first. I feed it the raw channel transcript and the timeline and ask it to nominate candidate wins, which I then judge.
You are reviewing an incident channel transcript and timeline.
Identify candidate "what went well" items in three categories:
1. CONTROLS THAT WORKED — automated guardrails, alerts, circuit
breakers, rate limits, or rollbacks that fired correctly and
limited damage.
2. HUMAN SAVES — moments where a responder's decision or hunch
meaningfully shortened detection or recovery.
3. PROCESS WINS — handoffs, escalations, or comms that went smoothly.
For each, quote the supporting line(s) from the transcript and note
the timestamp. Do NOT credit anything you can't tie to a quote.
Frame human saves around the ACTION, not praise of the person.
Output as a bulleted list of candidates for human review.
The “tie it to a quote” rule matters. Without it, models pad the section with generic compliments—“the team communicated effectively”—that mean nothing and read as filler. With it, I get items I can verify and a responder gets credit anchored in something they actually did.
Keep it blameless in both directions
Here’s the subtle trap: a “what went well” section can become its own status game if you let it turn into named praise. “Sarah heroically diagnosed the issue” sounds kind, but it sets up the implicit contrast—if Sarah’s the hero, who’s the goat? Blameless cuts both ways. Frame the win as a system property or a transferable action, not a personality.
A good AI-drafted section, after my edit, reads like this:
What went well
- The replica-lag check shortened recovery. Four minutes after the first alert, a responder checked read-replica promotion status and caught that failover had stalled (14:18 UTC). This is worth turning into a standard first-five-minutes check for database incidents.
- The rate limiter held. The circuit breaker on the payments retry path tripped at 14:21 and prevented the upstream from being overwhelmed during the degradation. Without it, blast radius would have included the cart service.
- Escalation was clean. Sev was bumped from 3 to 2 within six minutes of the symptom widening; the on-call lead pulled in the data team before being asked.
Notice the first item points at a repeatable check, not at a person. That’s the version that makes the next person comfortable saying “actually, I caught something too.”
Don’t let it sand down the failures
One caution. The point of this section is balance, not anesthesia. I’ve seen teams use a generous “what went well” to soften an impact section that should have stung. Resist that. The wins are real and worth recording; they don’t cancel the contributing factors. Keep them as separate, honest sections, and never let the model “balance the tone” by trimming the hard findings—a postmortem that flatters everyone is one nobody learns from.
The human owns the judgment of what was genuinely a save versus a lucky coincidence; the AI just reads the haystack and hands you candidates with receipts. That’s the split that keeps this fast and honest.
I keep this prompt in the prompts library next to the rest of the incident-writing set, and the full philosophy on reviews people trust is in the blameless postmortem guide. The rest of the postmortems category covers the other sections in the same spirit.
Write down what worked. It’s the only way to know what to protect.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.