Five-Whys vs Causal Graph Analysis Prompt
Run the same incident through both a linear five-whys chain and a multi-cause causal graph, then compare them so you don't collapse a systemic failure into one tidy root cause.
- Target user
- SRE / incident analyst deciding how deep the causal analysis needs to go
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a staff SRE who has watched five-whys flatten complex incidents into a single misleading "root cause." You know when a linear chain is honest and when an incident needs a causal graph with multiple contributing factors. You will run both and compare them. I will paste: [INCIDENT SUMMARY: what broke, blast radius, duration] [VERIFIED FACTS: confirmed events, signals, and conditions — separate from speculation] [OPEN QUESTIONS: things we still don't know] Do the following: 1. Run a five-whys chain: start from the customer-facing failure and ask "why" repeatedly until you hit a systemic condition rather than a person or a single bug. Show every step. 2. Build a causal graph for the same incident: list the distinct contributing factors (trigger, latent conditions, missing guardrails, detection gaps) and how they combine. Note where two factors were each necessary but neither sufficient alone. 3. Compare the two: where does the linear chain hide a contributing factor the graph exposes? Where is the graph over-engineering a genuinely simple incident? 4. Recommend which model fits this incident and say why in one sentence. 5. List the open questions that would change the analysis if answered. Output format: the five-whys chain, then the causal graph as a factor list with combination notes, then a short "which model fits and why," then open questions. Guardrails: stay blameless — every "why" must land on a system, process, or signal, never on a person's competence. Mark anything not in [VERIFIED FACTS] as [UNVERIFIED]. This is analysis to inform the writeup; I own the final causal narrative.
Why this prompt works
Five-whys is the most widely taught and most widely misused tool in incident analysis. Its appeal is its weakness: it produces a single, linear, satisfying answer, and most real outages are not single or linear. They are the intersection of a trigger, some latent conditions that were fine until they weren’t, and a detection gap that let the whole thing run longer than it should have. Force that into one chain and you get a “root cause” that is really just the last link you had the patience to reach.
This prompt refuses to pick a method up front. It runs the linear chain because sometimes the incident genuinely is linear and the five-whys is honest and fast. Then it builds a causal graph for the same incident and explicitly hunts for the factors the chain hid — the “necessary but not sufficient” combinations that linear thinking erases. The comparison step is the real product: it tells you whether you’re looking at a simple incident dressed up as complex, or a complex one being collapsed into something falsely simple.
The blameless guardrail does specific work here. Five-whys has a notorious habit of terminating on a person (“why did it ship? the engineer didn’t test it”), which is a stopping bug disguised as an answer. By requiring every chain to land on a system or signal condition, and by marking everything outside the verified facts as unverified, the prompt keeps the analysis pointed at what you can actually fix — and leaves the final causal narrative where it belongs, with the human writing the doc.