MTTR Retro Analyzer: Recurring Time-Sinks Prompt

Analyze a batch of past incidents to find the time-sinks that recur across them — the phase, the step, the manual toil — and rank what to automate first, so you cut MTTR systemically rather than one incident at a time.

Target user

SRE leads and reliability engineers running retros

Difficulty

Advanced

Tools

Claude, ChatGPT, Cursor

You are a reliability engineer who improves MTTR by studying many incidents at once, not by reacting to the last one. You look for the time-sinks that show up again and again — the same slow phase, the same manual step, the same "waited on X" — and you target those. Help me run that analysis. Paste your corpus: - A set of past incidents with lifecycle timestamps: [DETECT / ACK / DIAGNOSE / MITIGATE / RESOLVE TIMES PER INCIDENT] - Their notes/postmortems: [ACTIONS TAKEN, DELAYS, BLOCKERS PER INCIDENT] - Severity and service labels: [SEV / SERVICE] - Any known automation already in place: [EXISTING TOOLING] Analyze for recurring time-sinks: 1. **Phase-level pattern** — across the set, which MTTR phase (detect, ack, diagnose, mitigate, verify) most often dominates recovery time? Report median and p90 per phase and how many incidents each phase was the bottleneck for. Distinguish a systemically slow phase from a few outliers dragging the mean. 2. **Recurring concrete steps** — find the specific repeated time-sinks named in the notes: "waited for vendor", "couldn't find the runbook", "manual approval", "re-derived the same query", "no dashboard for X". Count how often each recurs and the time it tends to cost. 3. **Cluster by cause** — group the time-sinks into themes (observability gaps, manual toil, coordination/handoff, approval/process). Show which theme costs the most cumulative time. 4. **Rank what to automate** — propose concrete interventions (auto-enrichment, runbook automation, pre-approved rollback, better routing) ranked by recurrence × time-cost × feasibility. For each, name which past incidents it would have helped. 5. **Caveats** — call out small-sample bias, missing timestamps, and survivorship (incidents never recorded). Don't over-fit to one dramatic outage. Output format: a phase table (median/p90 + bottleneck count), a ranked "TOP TIME-SINKS" table (sink | recurrences | est. time cost | theme), and a ranked "AUTOMATE FIRST" list with the incidents each would have helped. Show your arithmetic. Analyze the system and process, never individual responders. You produce the analysis and recommendations; humans decide what to build.

Why this prompt works

This prompt operates in the learn phase, where the biggest MTTR wins are actually made. Improving one incident at a time is reactive and slow; the durable reductions come from spotting the time-sink that recurs across dozens of incidents and removing it once. That requires looking at a batch with statistical discipline, which is well suited to a structured LLM pass over a pasted corpus.

The analysis is deliberately two-level. The phase decomposition (median and p90, plus how many incidents each phase was the bottleneck for) finds where time systemically goes, while the recurring-step extraction finds the concrete repeated toil — the missing runbook, the manual approval, the re-derived query — that the phase numbers alone won’t name. Clustering those into themes and ranking interventions by recurrence times cost times feasibility turns a pile of postmortems into a prioritized automation backlog tied to the specific incidents each fix would have helped.

The guardrails protect the integrity of the conclusion. Demanding medians, p90s, sample sizes, and outlier flags stops a single dramatic outage from skewing priorities, and the hard rule against ranking individuals keeps retros blameless — which is what keeps the underlying data honest in the first place. The AI produces the analysis; humans decide what to build, so the learning phase stays grounded in real, defensible patterns rather than confident noise.

MTTR Retro Analyzer: Recurring Time-Sinks Prompt

Why this prompt works

Related prompts

Have We Seen This Before? Symptom-Match Prompt

Post-Fix Verification Checklist Prompt

Why this prompt works

Related prompts

Have We Seen This Before? Symptom-Match Prompt

Post-Fix Verification Checklist Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet