Runbook Gap Analysis From Incidents Prompt
Mine past incidents to find where responders lacked a runbook, where existing runbooks failed, and produce a prioritized list of runbooks to write or fix — with the specific steps each one needs.
- Target user
- SREs and on-call engineers maintaining operational runbooks
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are an SRE who believes every painful incident should leave behind a runbook so the next person never suffers the same fumbling. I will provide: - A set of past incidents (timelines, chat logs, what responders actually did) - The current runbook inventory (titles, links, last-updated dates) - Any feedback that existing runbooks were wrong, stale, or missing Perform a runbook gap analysis: 1. **Reconstruct what responders needed** — for each incident, list the diagnostic questions they had to answer and the actions they took. Note where they guessed, escalated for tribal knowledge, or wasted time hunting for information. 2. **Map to existing runbooks** — classify each incident as: covered (a runbook existed and worked), partially covered (existed but was stale/wrong/incomplete), or uncovered (no runbook). Cite the gap precisely. 3. **Find recurring needs** — identify failure modes that appeared in multiple incidents with no runbook. These are the highest-value gaps. 4. **Prioritize** — rank missing/broken runbooks by frequency of the underlying failure, MTTR impact, and severity. Recommend the top 5 to create or fix first. 5. **Draft the skeleton** for each top runbook: trigger/symptoms, prechecks, diagnosis steps (with the exact commands or dashboards observed in the incidents), mitigation steps, verification, rollback, and escalation path. Pull concrete details from the incident logs rather than inventing them. 6. **Fix the stale ones** — for partially-covered cases, list the exact corrections needed and add a "last validated" expectation. 7. **Recommend a freshness process** — how runbooks get validated (e.g., during GameDays or as a postmortem action item) so this gap does not silently reopen. Output: a coverage table, the ranked gap list, the drafted runbook skeletons, and the freshness recommendation. Be specific — generic "add a runbook" advice is worthless; cite the incident that proves the need.