Skip to content
CloudOps
Newsletter
All guides
Post Mortems with AI By James Joyner IV · · 10 min read

Choosing the Right Postmortem Format for the Incident With AI

Not every incident deserves a five-whys. Here's how to pick narrative, timeline, 5-whys, or contributing-factors postmortems—and how AI drafts the right one fast.

  • #postmortems
  • #postmortem
  • #ai
  • #templates
  • #sre

A staff engineer once handed me a fourteen-page five-whys document for a cert that expired because nobody renewed it. The root cause was “the calendar reminder was set to the wrong year.” We spent two hours of a review meeting reverse-engineering profundity out of a clerical mistake, and the real fix—a renewal automation—fit in one sentence. That postmortem got filed and never read, because everyone in the room knew it was theater.

The opposite failure is just as common: a genuinely gnarly multi-team outage crammed into a thin timeline template, all the interesting contributing factors flattened into “deploy caused issue.” The format you choose decides how much of the truth survives. Pick wrong and you either over-investigate trivia or under-investigate the thing that’ll bite you again.

The four formats, and when each earns its keep

There are really only four postmortem shapes I reach for, and most teams default to whichever one their template tool ships with. That’s the mistake.

Narrative. A chronological story in prose. Best for incidents where the sequence of human decisions is the interesting part—a confusing on-call handoff, a misdiagnosis that sent responders down the wrong path for an hour. Narrative preserves the “we believed X, so we did Y” texture that bullet points destroy.

Timeline. A timestamped table of events. Best for incidents where detection and response latency is the story. If your biggest finding is “it broke at 02:14 and we didn’t know until 02:51,” a timeline makes that 37-minute gap impossible to ignore.

Five-whys. A single causal chain drilled to a changeable root. Best for narrow, single-cause incidents with a clean line of causation. Genuinely useful for the cert-expiry if you stop at the first thing you can fix instead of philosophizing.

Contributing-factors. A list of independent conditions that each had to be true for the incident to happen. Best for the complex stuff: multiple teams, multiple latent bugs, a saturation cascade. Real outages rarely have a root cause—they have five things that lined up. This format refuses to pretend otherwise.

A prompt that picks the format for you

I don’t trust AI to write the postmortem unprompted, but I absolutely trust it to triage which shape fits, given a rough description. This saves the “we always use five-whys” reflex. Here’s the prompt I keep in my snippets:

You are an SRE helping pick a postmortem format. I'll describe an
incident. Recommend ONE of: narrative, timeline, five-whys, or
contributing-factors. Then give a 2-sentence justification tied to
what the most important lesson seems to be.

Heuristics:
- Five-whys ONLY if there's a single clean causal chain and a fixable root.
- Contributing-factors if 2+ independent conditions or 2+ teams involved.
- Timeline if the key lesson is detection/response latency.
- Narrative if the key lesson is human decision-making under uncertainty.

Incident: <paste a 3-5 sentence description>
Then output a section skeleton for the chosen format, with placeholder
headings only. Do not invent facts.

Feed it “Checkout latency spiked; the payments team’s new retry logic amplified load on an already-degraded database that the data team had silently failed over an hour earlier,” and it won’t hand you a five-whys. It returns contributing-factors, because two teams and two independent conditions are sitting right there in the sentence. That’s the call a tired incident commander gets wrong at 3am, and it’s exactly the kind of fast, low-stakes judgment AI is good at.

Let AI draft the skeleton, then you fill the truth

Once the format is chosen, I have the model generate the empty structure with placeholder headings and a one-line note under each about what belongs there. For a contributing-factors postmortem it gives me something like this, which I then populate by hand from the actual incident channel:

Contributing Factor 1 — Silent database failover (What condition existed? Why was it invisible? Was it reasonable given the info available?)

Contributing Factor 2 — Unbounded retry amplification (What made this safe in testing but dangerous under degradation?)

TriggerThe proximate event that turned latent factors into an outage.

Why these alignedWhat systemic gap let independent factors coincide?

The skeleton is scaffolding, not content. The model never sees enough to know that the failover was “silent” because the data team’s runbook skipped a notification step—a human writes that. But the model saved me ten minutes of formatting and, more importantly, stopped me from defaulting to the wrong shape. The rule I hold to: AI owns the structure and the tedium, a human owns the analysis and signs off. A postmortem nobody trusts is worse than no postmortem at all, and the fastest way to lose trust is to let a tool assert a root cause it can’t actually see.

Mixing formats is allowed (and often right)

The dirty secret is that the best postmortems for big incidents are hybrids: a timeline up top to establish the detection gap, then a contributing-factors section for the analysis, with one short narrative paragraph for the part where two engineers disagreed about the diagnosis and the disagreement cost twenty minutes. Tell the AI that explicitly—“timeline section, then contributing-factors, then a short narrative for the diagnostic detour”—and it’ll lay out the combined skeleton. Don’t let a template tool’s rigidity decide your investigation depth.

If you want a deeper take on keeping any of these formats honest and readable, I wrote up the blameless template I actually use. And the format-selection prompt above lives alongside the rest of my incident-writing prompts in the prompts library. For more on getting structured value out of incident reviews, the rest of the postmortems category goes deep.

Pick the format that fits the lesson, not the one your tool defaulted to. The format is a decision about how much truth you’re willing to keep.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.