Postmortem Counterfactual Analysis Prompt

Rigorously explore what would have detected or prevented this incident sooner — testing each counterfactual against what was actually knowable in the moment, so you avoid hindsight-driven action items.

Target user

SRE / incident commander deriving prevention and detection improvements

Difficulty

Advanced

Tools

Claude, ChatGPT, Cursor

You are a staff SRE trained in counterfactual reasoning for post-incident reviews. You know the trap: hindsight makes every cause look obvious, so "we should have noticed" is usually a story, not a finding. You test each counterfactual against the information available at the time. I will paste: [INCIDENT TIMELINE: with timestamps for detection, mitigation, resolution] [WHAT WAS KNOWN WHEN: signals, dashboards, and alerts that existed and what they showed during the event] [CONSTRAINTS: tooling, access, on-call load, and time pressure responders were under] Do the following: 1. Build a "detect sooner" counterfactual set: list candidate signals, alerts, or checks that could have surfaced the problem earlier. For each, evaluate whether the data to fire it actually existed at the time. 2. Build a "prevent entirely" counterfactual set: changes upstream (design, guardrail, test, review gate) that would have stopped the trigger. Assess feasibility and cost honestly. 3. Apply the counterfactual test to each item: could a reasonable responder, with the information and tools available in the moment, realistically have known or done this? Discard or downgrade hindsight-only items and say why. 4. Rank surviving counterfactuals by leverage: how much earlier detection or how much prevention per unit of effort. Note which add prevent vs detect vs mitigate defense. 5. Flag any counterfactual that trades one risk for another (e.g. a tighter alert that would page constantly). Output format: two tables (Detect-sooner / Prevent), each with columns Counterfactual / Was-it-knowable-then / Feasibility / Leverage, then a short ranked shortlist of the strongest candidates. Guardrails: stay blameless — frame everything as system and signal gaps, never "the engineer should have seen it." Mark any assumption about what was knowable as [UNVERIFIED] until I confirm it. These are candidate improvements; I own the decision on what becomes an action item.

Why this prompt works

Counterfactual reasoning is the engine of a useful postmortem, but it is also where hindsight bias does the most damage. After the fact, every contributing factor looks like it was waving a flag, and “we should have caught this” feels self-evident. It usually is not. The signal may not have existed, the dashboard may not have shown it, or the responder may have been drowning in pages. Treating those stories as findings produces action items that punish the past instead of improving the future.

This prompt builds counterfactuals in two directions — detect sooner and prevent entirely — and then subjects each to an explicit knowability test: with the information and tools present in the moment, could a reasonable person actually have acted on it? Items that survive are real opportunities; items that fail are downgraded with a stated reason. That discipline is exactly what separates a mature review from a blame exercise dressed in process language.

Ranking by leverage keeps the output actionable rather than a wish list, and flagging counterfactuals that trade one risk for another (the alert that would page constantly, the gate that would block every deploy) prevents the classic overcorrection. Throughout, the framing stays on signals and systems, and the final call on what becomes an action item stays with the human.

Postmortem Counterfactual Analysis Prompt

Why this prompt works

Related prompts

Postmortem What-Went-Well Section Writer Prompt

Why this prompt works

Related prompts

Postmortem What-Went-Well Section Writer Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet